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' Abstract 

These informal notes deal with a number of questions related to sums 
and integrals in analysis. 
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Part I 

Basic notions 

1 Real and complex numbers 

Of course, the real numbers R are contained in the complex numbers C, and 
every z G C can be expressed as z — x + yi, where x,y € R and i 2 = —1. In 
this case, x and y are called the real and imaginary parts of z, respectively. The 
complex conjugate z of z is given by 

(1.1) ~z = x — y i. 

It is easy to see that 

(1.2) z + w = ~z + W 
and 

(1.3) z w = ~zw 

for every z, w € C. The modulus \z\ of z is given by 

(1.4) \z\ = {x 2 +y 2 ) 1 ' 2 . 
Thus 

(1.5) \z\ 2 = zz. 
This implies that 

(1.6) \z w\ 2 — (z w)zlju — z w~zw — \z\ 2 \w\ 2 
for every z , w € C , and hence 

(1.7) \zw\ = \z\\w\. 

Note that the modulus of a real number is the same as its absolute value, and 
that the modulus of z = i + e C is the same as the Euclidean norm of 
(x,y) e R 2 . 
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2 Rearrangements 



Let 53^=1 CLj be an infinite series of real or complex numbers. If it is a one-to-one 
mapping from the set Z + of positive integers onto itself, then the series 

oo 

(2-1) E a -W 

i=i 

is said to be a rearrangement of X)^=i a j- 

Remember that X)^=i a j converges if the sequence of partial sums Y)j—i aj 
converges as n — > oo. If aj is a nonnegative real number for each j, then 
the partial sums are monotone increasing, and convergence is equivalent to 
boundedness of the partial sums. In this case, convergence of J2jLi a j implies 
the convergence of every rearrangement (|2.1[) . and the values of these sums are 
the same. More precisely, 

n N 

(2-2) E a -0)^E a ^ 

when 7r(l), . . . , ir(n) < N, so that the boundedness of the partial sums of 
Sj^i a j implies the boundedness of the partial sums of (|2.ip . Similarly, 

n N 

(2.3) E a J-XXo") 

when 7r _1 (l), . . . , 7r (n) < N, and these two simple extimates imply that the 
suprema of the partial sums of J2JLi a j an d (12.11) are the same. 

An infinite series YIJL^ dj of real or complex numbers is said to converge 
absolutely if YlJLi \ a j\ converges. It is well known that absolute convergence 
implies convergence, by the Cauchy criterion. If Y^jLi a j converges absolutely, 
then the preceding discussion implies that (|2.ip also converges absolutely, and 
one can show that the two sums have the same value. This is trivial when 
aj = for all but finitely many j, and otherwise X)^=i a j can be approximated 
by series with this property. Alternatively, 53fc=i a j may be expressed as a linear 
combination of convergent series whose terms are nonnegative real numbers, so 
that the equality of the sums reduces to the previous case. 



3 Generalized convergence 



Let £ be a nonempty set, and let f{x) be a real or complex-valued function on 
E. Let us say that J2 x eE f( x ) converges in the generalized sense if there is a 
A e R or C, as appropriate, such that for each e > there is a finite set A e C E 
for which 



(3.1) 



E 

xeB 



f{x) - A 



< e 
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whenever B C E is a finite set that satisfies A t C B. It is easy to see that such 
a A is unique when it exists, in which case ^2 x£E f( x ) ^ s defined to be A. 

If f(x) has this property and ir is a one-to-one mapping of E onto itself, 
then f(it(x)) has the same property, and 

(3.2) /m*)) = E /(*)■ 

l£B xEE 

This follows from the fact that 

(3.3) £/w*)) = E /(*) 

i£i x£7r(A) 

for every finite set AQ E. Thus this definition of J2 x ee f( x ) ^ s automatically 
invariant under rearrangements. 

Suppose that f(x) is a nonnegative real number for each x £ E. If the 
partial sums J2xeA f( x ) over nm te subsets A oi E are uniformly bounded, then 
J2xee /( x ) converges in the generalized sense, and 

(3.4) f( x ) = SU P \ E f( x "> :A( ^ Ehas onl y finitely many elements > . 

rrS-E ^ xeA ' 

If /(x) is a real or complex-valued function on E such that the sums X^eA \ f( x )\ 
over finite sets A Q E are bounded, then J2xee f( x ) a l so converges in the 
generalized sense. This follows by expressing f(x) as a linear combination of 
nonnegative real- valued functions for which the partial sums over finite subsets 
of E are bounded. 

Conversely, if f(x) is a real or complex- valued function on E such that 
J2xeE f( x ) converges in the generalized sense, then the sums ^2 xeA \ f(%)\ over 
finite subsets A of E arc uniformly bounded. To see this, one can take e = 1 
in the definition of convergence to get a finite set A\ C E for which the partial 
sums ^2 xeB f(x) over finite subsets B of E with A\ C B are uniformly bounded. 
This implies that the partial sums J2xeA f( x ) over arbitrary finite sets A C E 
are bounded, by taking B = A U A\, and using the fact that the sums over 
subsets of A\ are bounded. The boundedness of the partial sums of |/(x)| can 
then be obtained by applying this to finite sets 4C£on which f(x) is positive 
or negative in the real case, or on which the real or imaginary parts of f(x) are 
positive or negative in the complex case. 

4 Nets 

A partially ordered set (A, -<) is said to be a directed system if for every a, b G A 
there is a c e A such that a, b ~< c. A net {x a } ae A indexed by A assigns to 
each a e A an element x a of a set A. If X is a topological space, then the net 
{x a }a£A converges to x € A if for every open set U C A with x £ U there is 
an a € A such that xt, £ U when A and a < b. This reduces to the usual 
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definition of convergence of a sequence when A is the set of positive integers with 
the standard ordering. Now let E be a nonempty set, and let f(x) be a real or 
complex-valued function on E. The collection of nonempty finite subsets of E 
is partially ordered by inclusion, and defines a directed system. More precisely, 
any two finite subsets of E is contained in their union, which is also a finite 
subset of E. Consider the net associated to this directed system that assigns to 
each nonempty finite set B C E the real or complex number X^xes f( x )- ^ is 
easy to see that convergence of this net in R or C, as appropriate, is the same 
as convergence of J2 x eE f( x ) m ^ ne sense described in the previous section. 



5 Norms on vector spaces 

Let V be a vector space over the real or complex numbers. A norm on V is a 
nonnegative real- valued function ||u|| defined for v € V such that ||u|| — if and 
only if v = 0, 

(5.1) IIHI = 1*1 \H 

for every v € V and t £ R or C, as appropriate, and 

(5.2) ll« + HI<IHI + IHI 

for every v,w £ V. 

A set E C V is said to be convex if for every v,w € E and ! £ R with 
< t < 1, 

(5.3) tv + (l-t)w € E. 
If ||u|| is a norm on V and 

(5.4) B 1= {veV : \\v\\ < 1} 

is the corresponding closed unit ball, then it is easy to see that B\ is a convex 
set in V. 

Conversely, suppose that is a nonnegative real- valued function on V 
that satisfies the positivity condition ||u|| > when a^O and the homogeneity 
condition (|5.ip . If B\ is convex, then one can show that ||u|| satisfies the triangle 
inequality (|5 . 2[) . and hence that ||i>|| is a norm. To see this, let v, w € V be given, 
with v, w 7^ 0, since otherwise (15.2[) is trivial. Put 

(5-5) t/ = JL «/ = ^, 

IIHI IMI 

so that ||u'|| = \\w'\\ = 1. Thus «',«/ G -Bi, and hence 

(5.6) + < 1 

when t € R and < t < 1, by hypothesis. If £ = |M|/(||i>|| + ||w||), then 
l-t=|HI/(IHI + HI),and 

(5.7) tv' + (l-t)w'= „ l + ™ .. . 
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Therefore 
(5.8) 

which implies (|5.2p , as desired 

6 Bounded functions 

Let E be a nonempty set, and consider the spaces £°°(E, R), £°°(E, C) of real or 
complex-valued functions on E that are bounded. It is sometimes convenient to 
use the notation £°°(E) to refer to either of these spaces, which are vector spaces 
with respect to pointwise addition and scalar multiplication. The supremum or 
£°° norm is defined as usual by 

(6.1) \\f\\ 00 =sup{\f(x)\:x€E}. 

It is easy to see that this is a norm on £°°(E), because of the triangle inequality 
for the ordinary absolute value on R or modulus on C. 

7 Summable functions 

A real or complex- valued function f(x) on a nonempty set E is said to be 
summable if the partial sums YlxeA \f( x )\ over nonempty finite subsets A of E 
are uniformly bounded. This is equivalent to the convergence of YIxee 
in the sense of Section^ whose value is equal to the supremum of J2 xeA \f{x)\ 
over all nonempty finite sets A C E. Let ^(E, R), ^(E, C) be the spaces 
of summable real or complex-valued functions on E, respectively, which may 
also be denoted by ^(E) to include both cases at the same time. It is easy to 
see that these are vector spaces with respect to pointwise addition and scalar 
multiplication, and that 

(7-1) ll/lli = £ 

x£E 

defines a norm on these spaces. 

8 p-Summable functions 

Let f(x) be a real or complex-valued function on a nonempty set E, and let p be 
a positive real number. If |/(a;)| p is a summable function on E, then we say that 
f(x) is p-summable on E. The spaces of real or complex-valued p-summable 
functions on E are denoted £ P (E, R), £ P (E, C), respectively, or simply £ P (E) to 
include both cases at the same time. One can check that these are vector spaces 
over the real or complex numbers, as appropriate, with respect to pointwise 
addition and scalar multiplication of functions. 



v + w 



NI + HI 



< i, 
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If / is a p-summable function on E, then put 

(8.1) ii/ii p = (Ei/wi p ) 1/p - 

xeE 

This satisfies the positivity and homogeneity properties of a norm on £ P (E) for 
every p > 0. Let us check that this is a norm on £ P (E) when p > 1. As in 
Section [5l it suffices to show that the closed unit ball in £ P (E) associated to 
H/llp is convex when p > 1. Equivalently, if /, g are p-summable functions on 
E such that ||/|| p , ||ff|| p < 1, then we would like to check that 

(8.2) ||t/+(l-t) ff || p <l 
when feR and < t < 1. The main point is that 

(8.3) \tf(x) + (l-t)g(x)\ p < (t\f(x)\ + (l-t)\g(x)\r 

< t\f(x)\ p + (l-t)\g(x)\ p 

for every x G E, because of the convexity of the function <fr p (r) = r p on the 
nonnegative real numbers when p > 1. Hence 

(8.4) Yl i f /(*) + i 1 - l ) 9( x )\ p < 1 E i/(^)r + (!-*) E ^ L 

i£B i6£ xeE 

9 Monotonicity 

Let p be a positive real number, and let / be a real or complex-valued p- 
summable function on a nonempty set E. Clearly 

(9.1) \f(x)\ < \\f\\ p 

for every x £ E, which implies that / is bounded and satisfies 

(9.2) H/IU < ||/||p. 
If q > p, then / is also g-summable, because 

(9.3) i/(^)i 9 <ii/ii^i/(^)i p <ii/iir p i/(^)r 

for every x G E. Moreover, 

(9.4) wm = Yl \f^\ 9 ^ w p E i/(^)r = 11/112. 

xeE xeE 

and hence 

(9.5) 11/11, < \\f\\ P . 
If q = 1 , then we get that 

(9.6) (Ei/^O^Eiw 

xeE xeE 
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when / is p-summable and < p < 1. In particular, 
(9.7) (a + b) p <a p + b p 

for every pair of nonnegative real numbers a, b when < p < 1, by applying 
the previous inequality to a set E with exactly two elements. Conversely, one 
can apply (|9 . T[) repeatedly to get 

n 

_ \ v 

P 

x 

i=i i=i 



n n 

(9.8) (£«/f<£^ 



for any positive integer n and nonnegative real numbers ax, . . . , a n , which implies 
the analogous inequality for arbitrary sums by passing to a suitable limit. 

10 p-Norms, < p < 1 

Let V be a vector space over the real or complex numbers, and let ||w|| be a 
nonnegative real- valued function on V such that \\v\\ > when u ^ and 

(10.1) IMI = I*IIMI 

for every v € V and t € R or C, as appropriate. We say that ||«|| is a p-norm, 
< p < 1, if in addition 

(10.2) ||« + HI P < H p + HI P 

for every v,w € V. This reduces to the ordinary triangle inequality (|5.2|) when 
p = 1, so that a 1-norm is the same as a norm. For example, ||/|| p defines a 
p-norm on £ P (E) for any nonempty set E when < p < 1, because of (19. 7[) . 
Equivalently, ||w|| is a p-norm when 

(10.3) ii«+hi < (ikir + n^ir) 1/p 

for every u, w € 1^. As in the previous section, the right side of this inequality is 
monotone decreasing in p. Hence a p-norm is also a p-norm when < p < p < 1. 

Let B\ be the closed unit ball associated to ||u||, as in (15. 4p . If is a 
p-norm, then 

(10.4) av + bweBi 

whenever v,w S B\ and a, b are nonnegative real numbers such that a p + b p < 1. 
Conversely, let us check that this property implies that ||i>|| is a p-norm, as in 
Section[5] Let v, w be nonzero vectors in V, and put v' — v/\\v\\, w' = w/\\w\\, 
as before. Also put 

(-5) - ■ »- '-« 



(||w||p + IH^) 1 /?' (||v||p + ||w||p) 1 /p' 

Thus 

\\v\\ p \\w\\ p 

(10.6) + tf= „ „" "„ „ + „ „" ' , „ = 1. 

\\ v \\ p + \\ w \\ p \\ v \\ p + \\ w \\ p 
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and hence 

(10.7) av' + bw' = ... „ V+ n W n X1/ &B X . 

{\\ v \\ p + \\ w \\ p ) /p 

This implies the p-norm version of the triangle inequality when v,w 7^ 0, and 
of course it is trivial when v or w is equal to 0. 



11 Metric spaces 

Remember that a metric space is a set M with a nonnegative real- valued func- 
tion d(x, y) defined for i,jeM such that d(x, y) = if and only if x = y, 

(11.1) d(y,x)=d(x,y) 
for every x, y <G M, and 

(11.2) d(x,z)<d(x,y)+d(y,z) 

for every x,y,z <G M. If V is a real or complex vector space equipped with a 
norm ||w|| , then 

(11.3) d(v,w) = \\v — w\\ 

is a metric on V. Similarly, if \\v\\ is a p-norm on V for some p, < p < 1, then 

(11.4) d(v,w) = \\v- w\\ p 
is a metric on V. 

Let (M,d(x,y)) be a metric space. A sequence {xj}°^L 1 of elements of M is 
said to converge to a; e M if for every e > there is an L > 1 such that 

(11.5) d(xj,x) < e 

for every j > L. We say that {xj}° < iL 1 is a Cauchy sequence if for every e > 
there is an i > 1 such that 

(11.6) d(xj,x t ) < e 

for every j, i > L. It is easy to check that every convergent sequence is a Cauchy 
sequence, and a metric space is said to be complete if every Cauchy sequence 
converges to an element of the space. For example, it is well known that the 
real and complex numbers are complete with respect to their standard metrics. 
If { x j}JLi is a sequence of elements of M with the property that 

00 

(11.7) ^2d(xj,x j+1 ) 

1=1 

converges, then {xj}J^ 1 is a Cauchy sequence in M. This uses the triangle 
inequality to get that 

1-1 

(11.8) d(x k ,xi) < ^2d(xj,x j+1 ) 

j=k 
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when k < I. If M is complete, then it follows that {xj}°^ 1 converges in M. 
Converesely, if {xj}°°^ 1 is a Cauchy sequence in M, then there is a subsequence 
{*iJ£=i of {xj)f =l such that 

(H.9) d(x J „,a; J „ +1 )<2-" 

for each n, which implies that 

oo 

(11-10) Y,d(x jn ,x jn+1 ) 

n=l 

converges. If this subsequence converges, then {xj}'^L 1 converges to the same 
limit, because it is a Cauchy sequence. 

Let E be a nonempty set, and consider £ P (E), < p < oo. This is a metric 
space with respect to the metric associated to the norm ||/|| p when p > 1, or the 
p-norm ||/|| p when < p < 1, and it is well known that this space is complete. 
For if {fj})j — 1°° is a Cauchy sequence in £ P (E), then it is easy to see that 
{fj{ x )}fLi is a Cauchy sequence in R or C for each x € E, as appropriate. This 
implies that {fj{x)}j°^i converges pointwise on E, since the real and complex 
numbers are complete. One can check that the limit f(x) is also in £ P (E), and 
that {fj}j^i converges to / in the i v metric, as desired. 



12 Infinite series 

Let V be a real or complex vector space equipped with a norm or p-norm ||u||, 
< p < 1. This determines a natural metric on V, as in the previous section. 
As usual, an infinite series v j with terms Vj € V is said to converge if the 

corresponding sequence of partial sums 2j=i v j converges in V as n — > oo. Let 
us say that J^'jLi v j converges absolutely if 

oo 

(12.1) ElNI 

3=1 

converges when ||u|| is a norm, and if 

oo 

(12.2) ^iK-r 

3=1 

converges when \\v\\ is a p-norm. Note that the convergence of (|12.2[) is more 
restrictive as p decreases, as in Section As in the previous section, absolute 
convergence of J2jLi v j implies that the sequence of partial sums X^j=i v j ^ s a 
Cauchy sequence. In particular, absolute convergence implies convergence when 
V is complete. Conversely, V is complete if every absolutely convergent series 
with terms in V converges in V, by another argument mentioned in the previous 
section. 
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13 co(E) 



Let £ be a nonempty set, and let f(x) be a real or complex- valued function on 
E. We say that / vanishes at infinity on E if for every e > 0, |/(x)| > e for only 
finitely many 16E. The spaces of real or complex-valued functions on E that 
vanish at infinity are denoted co(E,H), cq(E,C), respectively, and are vector 
spaces with respect to pointwise addition and scalar multiplication of functions. 
As usual, we may also use cq(E) to refer to both cases at the same time. Note 
that f(x) 7^ for only finitely or countably many i£E when / £ cq(E). 

If / vanishes at infinity on E, then / is bounded, and so cq(E) is a linear 
subspace of £°°{E). More precisely, one can check that cq(E) is a closed linear 
subspace of £°°(E) with respect to the £°° norm. A function / on E is said 
to have finite support if f{x) =/= for only finitely many x £ E, in which case 
it obviously vanishes at infinity. One can also check that functions with finite 
support are dense in Cq{E) with respect to the £°° norm, so that cq(E) is the 
same as the closure in £°°(E) of the linear subspace of functions with finite 
support. 

If a function / on E is p-summable for some p > 0, then / vanishes at infinity 
on E. More precisely, the number of x £ E such that |/(x)| > e is less than or 
equal to 

(13.1) e -"]T|/(x)p\ 

x£E 

Of course, a function / with finite support on E is p-summable for every p > 0. 
It is not difficult to show that functions with finite support on E are dense in 
£' P (E) when < p < oo. 



14 Generalized convergence, 2 



Let E be a nonempty set, let V be a real or complex vector space with a norm 
or p-norm \\v\\, < p < 1, and let f(x) be a ^-valued function on E. We say 
that J2x£E f( x ) converges in the generalized sense if there is a A £ V such that 
for every e > there is a finite set A e C E such that 



(14.1) 



xeB 



< e 



whenever B C E is a finite set that satisfies A e C B. It is easy to see that 
A is unique when it exists, in which case it may be denoted J2 x ge f( x )- Of 
course, this is the same as the definition in Section [3] when V — R or C, and 
it is equivalent to the convergence of the net of partial sums of f(x) over finite 
subsets of E as in Section @] 

Similarly, we say that ^2 x£E f(x) satisfies the generalized Cauchy criterion 
if for every e > there is a finite set A e C E such that 



(14.2) 



£/(*) 



< e 



xeB 
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whenever B C E is a finite set with A e n B = 0. If X^xeE /C 3 -) converges in the 
generalized sense, then it is easy to see that J2 x eE f( x ) satisfies the generalized 
Cauchy criterion. Conversely, let us check that J2 xeE f( x ) conver g es hi the 
generalized sense when J2 x eE f( x ) sa tishes the generalized Cauchy criterion 
and V is complete. 

If J2xee f( x ) satishes the generalized Cauchy criterion, then it is easy to 
see that ||/(x)| vanishes at infinity on E, by considering sets B C E with 
only one element in the previous dehnition. In particular, /(x) 7^ for only 
finitely or countably many x e E. If /(x) 7^ for only finitely many x € E, 
then convergence of the sum is trivial, and so we suppose that f(x) 7^ for 
countably many x. Let {xj}^L l be an enumeration of the set of x e E such 
that f(x) 7^ 0, so that each element of this set occurs in the sequence exactly 
once, and consider the infinite series 5Zj=i f( x j)- Using the generalized Cauchy 
criterion for ^2 xeE f{x), one can check that the sequence of partial sums of 
YlJLi f( x j) forms a Cauchy sequence in V. If V is complete, then it follows 
that Y^jLi f( x j) converges in V. Using the generalized Cauchy criterion for 
^2xeE f( x ) a g am , one can show that J2 xeE f(x) converges in the generalized 
sense, and that the sum is the same as J2JLi f( x j)- 

15 Summable functions, 2 

Let E be a nonempty set, and let Fbea real or complex vector space equipped 
with a norm or p-norm ||w|| for < p < 1. Suppose that / is a V- valued function 
on E such that ||/(x)|| is summable on E when ||u|| is a norm on V, or that 
||/(x)|| p is summable on E when \\v\\ is a p-norm, < p < 1. If B C E is a 
finite set, then we have that 



in the second case. In both cases, one can use these simple estimates to check 
that X^efi f( x ) satisfies the generalized Cauchy criterion. If V is complete, 
then it follows that ^2 xeE /(x) converges in the generalized sense, as in the 
previous section. 

16 A special case 

Let E be a nonempty set, and suppose that <p € £ P (E) for some p, < p < oo. 
For each x e E, let S x (y) be the function on E defined by 8 x {x) — 1 and 
S x (y) = when y ^ x. Consider 



(15.1) 





(16.1) 



f(x)=<f>(x)5 : 



X 5 
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as a function on E with values in £ P (E). Observe that 

(i6.2) = E M») = 

xeE xeE 

for each y £ E, where these are sums over x € E of real or complex numbers 
that are equal to when x ^ y and hence converge trivially. One can also ask 
about the convergence of ^2 xeE f(x) in the generalized sense to <f), as a sum of 
elements of £ P (E). Of course, 

(16-3) \\f(x)\\ p = \<j>(x)\\\6 x \\ p = \<P(x)\ 

for every x e E. Thus ||/(a;)||p is p-summable on E when < p < oo, and 
bounded on E when p = oo. If < p < 1, then this is the same as the 
summability condition mentioned in the previous section. However, one can 
check that J2 x eE f( x ) converges to <fi in the generalized sense in £ P (E) for every 
positive real number p. If p = oo, then ^2 xeE f(x) converges to <f> in the 
generalized sense in £°°(E) if and only if e c (E). 



17 Inner product spaces 

An inner product on a real or complex vector space V is a real or complex- valued 
function (u, to), as appropriate, defined for v, w e V and satisfying the following 
three conditions. First, (v, w) is a linear function of v for each w e W. Second, 

(17.1) (w,v) = (v,w) 

for every v, w G V in the real case, and 



(17.2) (w,v) = (v,w) 

in the complex case. In particular, 



(17.3) (v,v) = {v,v) e R 
for every v € V in the complex case. Third, 

(17.4) (w,w)>0 

for every tieF with u^O. 
Put 

(17.5) |MIH«,«) 1/2 . 

The Cauchy-Schwarz inequality states that 

(17.6) |<«,«;)|<|H||H| 
for every v,w <G V. Using this, one can show that 

(17.7) || V + HI<NI + HI 
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for every v,w G V, so that ||i;|| defines a norm on 1^. If V is complete with 
respect to this norm, then V is said to be a Hilbert space. 

Let E be a nonempty set, and let /, g € £ 2 (E) be given. Remember that 



(17.. 



ab < 



a 2 + b 2 



for every a, b > 0, since (a — b) 2 > 0, so that 

(17.9) Yl i/wi ifftoi ^ \ E i/ooi 2 + 5 E I-9WI 2 < +°°- 

Thus | /(at) | | <7 (cc) | is summable on E, and it is easy to see that 

(17.10) (f,g) = J2f(x)9(x) 

defines an inner product on £ 2 (E, R), and that 

xeE 



(17.11) 



defines an inner product on £ 2 {E, C). The corresponding norm is the same as 
the £ 2 norm discussed in SectionEl These spaces are also complete, as in Section 
ITTl and are therefore Hilbert spaces. 

A pair of vectors j), w in an inner product space V are said to be orthogonal 

if 

(17.12) (v,w)=0. 
This may also be expressed by v _L w. In this case, 

(17.13) ||u + «;|| 2 = (v + w,v + w) = (v,v) + (w,w) = ||u|| 2 + ||w|| 2 . 
If Vi , . . . , v n £ V and Vj _L vi when j I, then we get that 



E^' =Ein 

i=i j=i 



(17.14) 

18 Inner product spaces, 2 



Let £ be a nonempty set, let (V, (v,w)) be an inner product space, and let / 
be a F-valued function on E such that 



(18.1) 

when x ^ y. Thus 
(18.2) 



fix) ± f(y) 

£/(*) 2 = Eii/wn 



x£B 



xeB 
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for every finite set B C E, as in the previous section. If ||/(x)|| 2 is a summable 
function on E, then it follows that J2 x eE f( x ) satisfies the generalized Cauchy 
criterion, and hence converges in the generalized sense when V is complete. In 
this case, one can also check that 



(18.3) 



£/(*) 



= Eu/o» 

xeE 



19 Infinite series, 2 

Let V be a real or complex vector space equipped with a norm or p-norm ||w||, 
< p < 1, and let Y^jL\ v j be an infinite series with terms in V. This can 
also be considered as a sum over E = Z + , so that the notions of convergence 
in the generalized sense and the generalized Cauchy criterion are applicable. If 
Ejli v j converges in the ordinary sense and satisfies the generalized Cauchy 
criterion as a sum over Z+, then it is easy to see that Y^jLi v j converges in the 
generalized sense, and to the same sum. 

Suppose that Yl'jLi v j does not satisfy the generalized Cauchy criterion. 
This means that there is an e > such that for each finite set A C Z + there is 
another finite set B C Z + such that AflB = and 



(19.1) 



> e. 



Using this repeatedly, one can get finite subsets A n , B n of Z + such that 
{l,...,n} C An, A n n B n — 0, A n U B n C and 



(19.2) 



jeB n 



> e 



for each n. Let k n be the number of elements of A n and 



elements of B n , so th&t ft ^ k n <C k n -\- l n ^ k n -^-i for each n 



be the number of 
Also let 7r be 

a one-to-one mapping of Z + onto itself such that A n = {7r(l), . . . ,7r(fc„)} and 
B n = {ir(k n + 1), . . . ,7r(fc„ + l n )} for each n. This is easy to arrange, because 
of the inclusion and disjointness properties of the A^s and S„'s. Thus 



(19.3) 



j=k n + l 



> e 



for each n. This implies that the partial sums of 



j=i V *U) 



do not form a 



Cauchy sequence, and in particular that J2jLi v ^{j) ^ oes n °t converge in the 
ordinary sense. 



if E- 



= 1 "3 



satisfies the generalized Cauchy criterion, then it is easy to see 



that the partial sums of every rearrangement Ejli v ^(j) °f Yl°j 
Cauchy sequence. Conversely, if the partial sums of every rearrangement of 



1 Vj form a 
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Vj form a Cauchy sequence, then YlJLj Vj satisfies the generalized Cauchy 
criterion, by the argument in the preceding paragraph. Similarly, every rear- 
rangement of X^jli v j converges to the same sum when Y^jLi v j converges in 
the generalized sense. Conversely, if every rearrangement of Y^jLi v j converges, 
then X^jli v j satisfies the generalized Cauchy criterion, by the previous remarks. 
Hence Y^jLi v j converges in the generalized sense, because it converges in the 
ordinary sense, as mentioned at the beginning of the section. 



20 Holder's inequality 



Let E be a nonempty set, and suppose that 1 < p, q < oo are conjugate exponents 
in the sense that 

1 1 

20.1 - + - = 1. 

p q 

If / G £ P (E) and g e £ q (E), then Holder's inequality states that fg G ^{E), 
and that 

(20.2) ll/slli < ll/MMI,- 

This is quite straightforward when p = 1, q = oo or p = oo, q = 1, and so we 
focus now on the case where 1 < p,q < oo. Note that the p = q = 2 case is 
another version of the Cauchy-Schwarz inequality. 
If a, b are nonnegative real numbers, then 

a p hi 

(20.3) ab<— + — . 

p q 

This can be seen as a consequence of the convexity of the exponential function. 
In particular, 

for every x € E. Hence 

(20.5) £ |/(x)| \g{x)\ < \ £ \f{x)\* + I ]T \g(xW < + Ml 

xeB ^ xeB y xeB ^ y 

for every finite set B C E 1 . This implies that / g is summable on E, with 

II f IIP ll 9 ||9 



(20.6) j| /5 || 1 <±L^ + 

p q 

This implies Holder's inequality when j|/|| p = \\g\\ q = 1. Otherwise, if 
f,g^0, then we can apply this to 

(20-7) /=ir^r. ' 



\\g\U 
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Thus / e l p (E), g e l q (E), \\f\\ p = \\g\\ q = 1, and the previous inequality 
implies that 

,m8) irar 1 ' 7 * 51 - 

Of course, Holder's inequality is trivial when either / or g is identically on E. 



21 Bounded linear functionals 

Let V be a vector space over the real or complex numbers. A linear functional 
on V is simply a linear mapping from V into R or C, as appropriate. Suppose 
now that V is also equipped with a norm ||u|| . A linear functional A on V is said 
to be bounded with respect to this norm if there is a nonncgative real number 
C such that 

(21.1) |A(«)|<C|H| 
for every v e V. In this case, we put 

(21.2) ||A||, =sup{|A(v)| : v e V, \\v\\ < 1}, 

which is the same as the smallest C > for which the previous inequality holds. 
The boundedness of a linear functional A on V implies that 

(21.3) \\{v) - X(w)\ = \\(v - w)\ < C \\v - w\\ 

for some C > and every v, w € V. This shows that a bounded linear functional 
A is uniformly continuous on V. Conversely, if a linear functional A on V is 
continuous at 0, then there is a S > such that 

(21.4) \X(v)\ < 1 

for every v £ V with ||w|| < 8. This implies that A is bounded, with C = 1/5. 

The space of arbitrary linear functionals on V is a vector space with respect 
to pointwise addition and scalar multiplication of functions. It is easy to see 
that the space V* of bounded linear functionals on V is also a vector space in 
this way, and that ||A||* defines a norm on V*, known as the dual norm. Note 
that V* is automatically complete with respect to the dual norm. For if 
is a Cauchy sequence of bounded linear functionals on V with respect to the 
dual norm, then {\j(v)}JL 1 is a Cauchy sequence of real or complex numbers, 
as appropriate, for each v eV. Hence {A J (w)}°^ 1 converges in R or C for each 
v e V, by completeness. It is easy to see that the limit defines a linear functional 
A on V, which is also bounded because the Aj's have uniformly bounded dual 
norms. One can also show that {Xj}jZ 1 converges to A with respect to the dual 
norm, using the fact that {Aj}^ is a Cauchy sequence with respect to the dual 
norm. 

The definitions of bounded linear functionals and the dual norm also make 
sense when \\v\\ is a p-norm on V. The dual space V* is still a vector space in 
this case, and the dual norm is still a norm on V*, and not just a p-norm. The 
dual space is also complete with respect to the dual norm, but there are some 
other problems with the dual space when \\v\\ is not a norm, as we shall see. 
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22 Holder's inequality, 2 



Let £ be a nonempty set, and let 1 < p, q < oo be conjugate exponents. For 
each g G £ q {E), put 

(22.1) \(/) = 

when / G £ P (E). This makes sense, because of Holder's inequality, and satisfies 

(22.2) |A,(/)| < ll/IUMI,. 

Thus A s is a bounded linear functional on £ P (E), with dual norm less than or 
equal to \\g\\ q - It is well known and not too difficult to show that the dual norm 
of X g on £ P (E) is actually equal to If q = 1 and p = oo, then one can also 

restrict X g to cq(E). One can also check that the dual norm of the restriction 
of X g to cq(E) with respect to the £°° norm is also equal to \\g\\i- 

It is also well known that every bounded linear functional A on £ P (E) is of 
the form X g for some g G £ q (E) when 1 < p < oo, and that every bounded 
linear functional on co(E) with respect to the £°° norm is of the form A g for 
some g G ^{E). The basic idea is to put 

(22.3) g(x) = X(S X ), 

where S x (x) = 1 and S x (y) = when y € E and y ^ x. Using the boundedness 
of A, one can show that g G £ q (E). By construction, 

(22.4) A(/) = Xg(f) 

when f(x) ^ for only finitely many x € E. This implies the same relation 
for every / G £ P (E), 1 < p < oo, or / e co(E), as appropriate, because of the 
density of functions with finite support on E in these spaces. 

If < p < 1, then £ p (E) C £\E), and ||/||i < for every / G F(^). It 

follows that the restriction of a bounded linear functional on ^(E) to £ P {E) is 
a bounded linear functional with respect to the p-norm In particular, if 

g G £°°(E), then the restriction of A g to £ P (E) is a bounded linear functional 
with dual norm less than or equal to ||<?||oo with respect to One can 

check that the dual norm of X g on £ P (E) is actually equal to ||g||oo, because 
^g(Sx) — g{x) and \\8 X \\ P = 1 for each x G E. 

Conversely, if A is a bounded linear functional on £ P (E), < p < 1, then 
X = X g for some g G £°°(E). The proof is basically the same as when p = 1. If 
g is as in (|22.3p . then g is bounded, and the £°° norm of g is less than or equal 
to the dual norm of A on £ P (E), because \\S X \\ P — 1 for each x G E. One can 
then use density of functions with finite support in £ P (E) to show that A = X g . 

23 Hilbert spaces 

Let (V, (u, w)) be a real or complex inner product space, and put 
(23.1) X w (v) = (v,w) 



24 



for each w € W. By the Cauchy-Schwarz inequality, this is a bounded linear 
functional on V, with || A w ||* < More precisely, 



(23.2) 



|A|| 



w\ 



because X(w) = \\w\\ 2 . If V is complete, then it is well known that every bounded 
linear functional on V is of this form. Let us briefly review a proof of this fact. 

Let Y C V, Y 7^ 0, and z £ V be given, and let {yj}f^i be a sequence of 
elements of Y such that 



(23.3) 

Note that 
(23.4) 



lim \\yj - z\\ = M{\\y - z\\ : y G Y}. 

j->oo 



U + V 



2 Hull 2 



for every u, v € V, which is a version of the parallelogram law. Applying this to 
u = jjj — z, v — yi — z, we get that 



Vj + VI 



hj-yi\\ 2 _ ho-A? , llw-^ll 



(23.5) 



for each j, I > 1. If Y is convex, then (yj + y;)/2 g Y for every j, Z, and hence 



(23.6) 



M{\\y-z\\:yeY}< 



Vj + 



Combining this with (|23.3[) and (|23.5[) . we get that 



(23.7) 



lim 



11% - yi\\ = o. 



Thus {yj}°^L 1 is a Cauchy sequence when Y is convex. If V is complete and Y is 
also closed, then {yj}JLi converges to an element y of Y with minimal distance 
to z. 

If Y is a linear subspace of V, then one can show that y £ Y has minimal 
distance to z € V if and only if z — y is orthogonal to every element of Y. 
One can also check that y is uniquely determined by these properties. If V is 
complete, Y is a closed linear subspace of V, and z £ V, then it follows from 
that there is a y £ Y such that y — z is orthogonal to every element of Y. 

Let A be a bounded linear functional on V, and let 



(23. 



Y = {v £ V : X(v) = 0} 



be the kernel of A. Thus Y is a closed linear subspace of V, and Y = V if and 
only if A = 0. If A ^ 0, then there is a w' £ V such that w' ^ and io' 1 1/ for 
every y G Y, by the discussion in the previous paragraphs. In this case, one can 
check that A = A^, where w is a scalar multiple of w' . This uses the observation 
that Y has codimension 1 in V, so that every element of V can be expressed as 
a linear combination of w' and an element of Y. 
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24 The Hahn— Banach theorem 



Let V be a real or complex vector space with a norm ||u||, and let W be a, linear 
subspace of V. The Hahn-Banach theorem states that every bounded linear 
functional on W can be extended to a bounded linear functional on V with the 
same norm. Note that this theorem does not work for p- norms, < p < 1. By 
standard arguments based on uniform continuity, a bounded linear functional 
on W has a unique extension to a bounded linear functional on the closure of 
W with the same norm, and this does work for p-norms on V. 

It follows from the Hahn-Banach theorem that for every v £ V with v ^ 
there is a A € V* such that ||A||» = 1 and 

(24.1) m = \\v\\. 

More precisely, (|24.ip determines a unique linear functional on the 1-dimensional 
subspace of V spanned by v, and the Hahn-Banach theorem implies that there 
is an extension of this linear functional to V with dual norm equal to 1. Note 
that this corollary does not hold for £ P (E) when < p < 1 and E has at least 
two elements. 

Let V be the space of continuous real or complex- valued functions / on the 
unit interval [0,1]. If < p < oo, then put 

(24.2) \\f\\ P =(j\f{x)\ P dx) 1/P . 

One can check that this is a norm when p > 1 and a p-norm when < p < 1 , 
in the same way as for £ p . The counterpart of ||/|| p for p = oo is the supremum 
norm 

(24.3) H/IU = sup{|/(z)| : < x < 1}. 

It is well known that V is complete with respect to the supremum norm, and 
not with respect to ||/|| p when < p < oo, for which the completions of V can 
be described in terms of Lebesgue integrals. 
If < p < q < oo, then 

(24.4) ' II/IIp< ll/ll, 

for every continuous function / on [0,1]. This is easy to see when q = oo, 
and it follows from the convexity of r q l p on the nonnegative real numbers when 
q < oo. One can show that the only bounded linear functional on V with 
respect to ||/|| p is the trivial linear functional equal to when < p < 1. This 
is because every continuous function / on [0, 1] can be expressed as Ym=i fi f° r 
some continuous functions /i, . . . , f n such that Ym=i ll//||p i s arbitrarily small 
when p < 1. More precisely, one can choose the //'s to be supported on intervals 
of length approximately 1/n. 

25 Weak summability 

Let E be a nonempty set, and let V be a real or complex vector space with 
a norm ||u||. Also let f(x) be a ^-valued function on E such that J2xeE f( x ) 



2G 



converges in the generalized sense. If A is a bounded linear functional on V, 
then ^2 x€E A(/(x)) also converges in the generalized sense, and 

(25.1) a( £/(*)) = £ WO)- 

xeE xeE 

Of course, X^gs automatically converges in the generalized sense when 
||/(x)|| is summable on E, in which case X(f(x)) is summable on E for every 
A e V*, and 

(25.2) £|A(/(x))|<||A||*£||/(*)||. 

xeE xeE 

However, we have seen examples where J2xeE f( x ) converges in the generalized 
sense, even though ||/(x)|| is not summable on E. If <fr(x) is a real or complex- 
valued function on E such that J2 x eE converges in the generalized sense, 
then (j){x) is summable on E. In particular, A(/(x)) is a summable function on 
E for every A e V* when J2 xeE /(x) converges in the generalized sense. 



26 Bounded partial sums 

Let V be a real or complex vector space with a norm or p-norm ||v||,0<p<l. 
Also let X(V) be the space of sequences {vj}°^ 1 of elements of V such that 
the partial sums 5Z" =1 Vj of X^=i w j arc uniformly bounded in V. It is easy to 
see that X(V) is a vector space with respect to termwise addition and scalar 
multiplication. Moreover, 



(26.1) 



[vj}jLi\\x(v) = sup 



n>l 



E^ 



is a norm or p-norm on X(V), as appropriate. If {vj}j^ 1 G A(F), then the 
sums w j are uniformly bounded over 1 < I < n, because 



(26.2) 

More precisely, 
(26.3) 



J2 V J = E^ _ E^- 

j=l i =1 i =1 



< 



E«i + E 



<2||{« J -}^ 1 ||x ( v) 



when ||u|| is a norm on V. Similarly, 



(26.4) 



3=1 



^ E 



+ 



E w j 



<2||K}~ 1 ||^ 



(V) 
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when ||u|| is a p-norm on V, so that 



(26.5) 



0=1 



5> <^\\{vj}T^\\x ( vy 



In particular, {vj}J^ l is bounded, by taking I = n. 

An infinite series Y^jLi v j with terms in V satisfies the ordinary Cauchy 
criterion if for every e > there is an L > 1 such that 



(26.6) 



< e 



when n > I > L. This is equivalent to saying that the sequence of partial sums 
Y^j=i v j i s a Cauchy sequence in V. Note that the partial sums are bounded in 
this case, so that {vj}°^ 1 £ X(V). Put 



(26 



7) X (V) = |{uj}£Li € X(V) : satisfies the Cauchy criterion!. 

^ .7 = 1 ' 



It is easy to see that X (V) is a linear subspace of X(V), and that {vj}°^ 1 is an 
element of Xq(V) when Vj = for all but finitely many j. One can also check 
that X (V) is closed in X(V), and in fact that X (V) is the closure in X(V) 
of the linear subspace of sequences {vj}°^ 1 such that vj = for all but finitely 
many j. If V is complete, then X (V) is the same as the space of sequences 
{vj}^i such that YlJLi v j converges in V. 



27 Bounded finite subsums 



Let £ be a nonempty set, and let V be a real or complex vector space with a 
norm or p-norm ||f||, < p < 1. Also let Y(E,V) be the space of V- valued 
functions f(x) on E such that the sums ^2 xeB f(x) over nonempty finite subsets 
B of E are uniformly bounded in V. It is easy to see that this is a vector space 
with respect to pointwise addition and scalar multiplication, and that 



(27.1) \\f\\ Y(EtV) = sup 



£/(*) 



xeB 



B C E, B ^ 0, and B has 

only finitely many elements j- 

is a norm or p-norm on Y(E, V), as appropriate. Note that each / £ Y(E, V) 
is bounded, and that 

(27.2) sup||/(.T)||<||/|| y(B , y) . 

xeE 

Let Y (E, V) be the set of V-valued functions f(x) on E such that J2 x ee f( x ) 
satisfies the generalized Cauchy criterion. It is easy to see that this is a closed 
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linear subspace of Y(E,V). If f(x) = for all but finitely many x € E, then 
/ G Yo(E, V), and in fact Y (E, V) is the same as the closure in Y(E, V) of the 
linear subspace of V- valued functions on E with finite support. If V is complete, 
then Yq(E,V) is also the same as the collection of V- valued functions f(x) on 
E such that J2xee f( x ) converges in the generalized sense. 

If ||u|| is a norm on V and ||/(x)|| is summable on E, or if ||v|| is a p-norm 
on V and ||/(x)|| is p-summable on E, < p < 1, then / e Y(E, V), and 

(27.3) \\f\\ Y{E , v) <J2\\m\\ p . 

xl£E 

Furthermore, / <E Yq(E,V) under these conditions. Conversely, if V — R and 
/ G Y(.E, R), then / is summable on E, and 

(27.4) 2 U/ll W) . 
More precisely, 

(27.5) X! E "/(*)< ll/l|y(s,R). 

/(x)>0 /(x)<0 

Similarly, if V = C and / e C), then / is summable on E, and 

(27.6) £|/(*)|<4||/|| y(1Si c). 

In this case, the real and imaginary parts Re/, Im/ of / arc in ^(E 1 , R), and 
satisfy 

(27.7) ||Re/||y(B,R)j II Im/||y(£,R) < ||/||r(£,c)- 

This implies the desired estimate for the i 1 norm of /, which is less than or 
equal to the sum of the I 1 norms of the real and imaginary parts of /. 

28 Uniform boundedness 

Let V be a real or complex vector space with a norm or p-norm ||u||, and take 
E = Z+. Thus a V- valued function on E is basically the same as a sequence 
with terms in V, and Y(Z + , V) can be identified with a linear subspace of -X^V"). 
Also, y (Z+, V) corresponds to a linear subspace of X (V) with respect to this 
identification, and the X{V) norm is less than or equal to the Y"(Z+, V) norm. 
By definition, Y(Z+, V), Yo(Z + , V), and the Y(Z + , V) norm are invariant under 
one-to-one mappings of Z + onto itself, while X(V), X (V), and the X(V) norm 
are not invariant under rearrangements. 

Suppose that {vj}° < iL 1 is a sequence of elements of V such that {v w ^}jl 1 is 
an element of X(V) for every one-to-one mapping tt from Z + onto itself, and 
let us show that {vj}"^ corresponds to an element of Y(Z + , V). This would 
be immediate if we also asked that the X(V) norm of {v 7T (j)}°Z 1 be uniformly 
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bounded, independently of ir. If {vj}jl 1 does not correspond to an element of 
Y(Z + , V), then there is a sequence of finite subsets B\, B2, • • • of E such that 



(28.1) 



E 



— > 00 as n 



One can also argue a bit more to get the B n 's to be pairwise disjoint. This 
permits us to choose n so that B n = {n(k n ), . . . , 7r(l ra )} for some k n , l n G Z+ 
with k n < Z„ and every n. Hence {v,r(j)}j*Li ^ -^"(V)j as desired. Of course, 
the analogous statement for the generalized Cauchy criterion was discussed in 
Section [H 



29 Uniform boundedness, 2 

Let M be a metric space, and let A be a collection of continuous real or complex- 
valued functions on M. Suppose that A is pointwise bounded on M, in the sense 
that 

(29.1) A(x) = {/(x) 

is a bounded set in R or C, as appropriate, for each x G M. Put 

(29.2) A n = {x G M : \f{x)\ < n for each / G A}, 
so that A n is a closed set in M for each n, by continuity, and 

oo 

(29.3) \jA n = M, 

n=l 

by pointwise boundedness. If M is complete, then the Baire category theorem 
implies that A n contains a nonempty open set in M for some n. 

Suppose now that V is a real or complex vector space with a norm or p-norm, 
and that A is a collection of bounded linear functionals on V. If A is bounded 
pointwise on V and V is complete, then A is uniformly bounded on a nonempty 
open set in V, as in the previous paragraph. Using linearity, one can check that 
the elements of A have uniformly bounded dual norms. This is a version of 
the Banach-Steinhaus theorem, or uniform boundedness principle. Of course, 
A is uniformly bounded on bounded subsets of V when the dual norms of the 
elements of A are uniformly bounded. 

Now let W be a real or complex vector space with a norm \\w\\, and let K 
be a subset of W . Suppose that 

(29.4) K{\) = {X(w) :weK} 

is a bounded set in R or C, as appropriate, for each bounded linear functional A 
on W. Each w G W determines a bounded linear functional on W* , which sends 
A G W* to its value X(w) at w. Dual spaces are automatically complete, and 
so the boundedness of K(X) for each A G W* implies that the linear functionals 
A H> X(w) corresponding to w G K have uniformly bounded dual norm on W*, 
as in the preceding paragraph. It follows that if is a bounded set in W, by the 
Hahn-Banach theorem. 
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30 Sums and linear functionals 



Let E be a nonempty set, and let V be a real or complex vector space with 
a norm or p-norm ||u||. If /(x) is a ^-valued function on E with uniformly 
bounded finite subsums, then A(/(x)) has the same property for each bounded 
linear functional X on V. Moreover, 

(30.1) || A o f\\ Y (E,R) or || A o f\\ Y (E ,c) < l|A||* ||/||i%E,y), 
as appropriate. This implies that A(/(x)) is summable on E, with 

(30.2) l>(/(aO)| <2||A||.||/|| y(B ,v) 

xGE 

in the real case, and 

(30.3) ^|A(/(^))| <4||A|U||/||^ (S ^ ) 



x£E 



in the complex case. 
Conversely, 



(30.4) 



x£B 



xeB 



xeB 



for every finite set B C E and A G V* . Suppose that A(/(x)) is summable on 
£ for each X £ V*, and that 



(30.5) 



£|A(/(x))|<C||A|| 



xeE 



for some C > and every A G V*. If ||u|| is a norm on V, then the Hahn-Banach 
theorem implies that 



(30.6) 



£/(*) 



< C 



for every finite set B C E. Hence / £ Y(E, V) and 



(30.7) 



l/ll 



Y(E,V) 



< c 



under these conditions. 

Let K be the set of vectors in V of the form Y1 X &B /( x )> where B C J5 is a 
finite set. If A(/(x)) is summable on i? for some A G V*, then the set if (A) as 
in (|29.4|) is bounded. If A(/(x)) is summable on £7 for every A G V*, and if ||u| 
is a norm on 1/, then it follows that if is a bounded set in V, as in the previous 
section. This is the same as saying that / G Y(E, V). 
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31 Seminorms 



Let V be a vector space over the real or complex numbers. A nonnegative 
real- valued function N(v) on V is said to be a seminorm if 



for every v,w G V. Thus a seminorm N(v) is a norm exactly when N(v) > 
for every «eF with u^O. As another class of examples, N\(v) — \X(v)\ is a 
seminorm on V when A is a linear functional on V. Observe that 



is a linear subspace of V when N(v) is a seminorm on V". 

Let Af be a collection of seminorms on V. Let us say that U C V is an 
open set with respect to A/" if for every u G U there are finitely many seminorms 
N\,. . . ,Ni G Af and positive real numbers n , . . . , r; such that 



It is easy to see that this defines a topology on V. Note that this topology is 
Hausdorff if and only if Af satisfies the positivity condition that for each d£F 
with v 7^ there is an N G Af such that N(v) > 0. If Af consists of a single 
norm, then this is the usual topology associated to the norm. 

Suppose that V is equipped with a norm or p-norm \\v\\v, and consider the 
collection of seminorms on V of the form N\(v) — \X(v)\, where A G V*. The 
topology on V associated to this collection of seminorms is known as the weak 
topology. If \\v\\v is a norm on V, then the Hahn-Banach theorem implies that 
for each v G V with v ^0 there is a A G V* such that X(v) ^ 0. Thus N\(v) > 0, 
and so the weak topology on V is Hausdorff when \\v\\v is a norm. Note that 
open subsets of V with respect to the weak topology are open with respect to 
|| i>|| y, because the linear functionals being used are bounded. 

Now let W be a real or complex vector space with a norm or p-norm ||w||vi/, 
and consider V — W* . Each w G W determines a linear functional A i— > A (it;) 
on W* , and hence a seminorm N^(X) = \X(w)\ on W*. The topology on W* 
defined by this collection of seminorms is known as the weak* topology. This 
topology is automatically Hausdorff, but it is helpful for ||io||n/ to be a norm on 
W so that there are plenty of bounded linear functionals on W . Note that every 
open set in W* with respect to the weak* topology is also open with respect to 
the dual norm on W* . 

32 Sums in dual spaces 

Let E be a nonempty set, let W be a real or complex vector space with a norm 
or p-norm ||w;||, and let / be a function on E with values in the dual W* of W. 



(31.1) N(tv) = \t\ N(v) 

for every v G V and t G R or C, as appropriate, and 

(31.2) N(v + w) < N(v) + N(w) 



(31.3) 



{veV : N(v) = 0} 



(31.4) 



{veV : Nj(u - v) < rj, j = 1, . . . , 1} C U. 
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Suppose that f(x)(w) is a summable function on E for every w £ W, where 
f(x)(w) refers to the value of f(x) £ W* at w, and that 



(32.1) 



£|/0z)M|<cHI 



xEE 



for some C > and every w € W . In this case, X^ge f( x )( w ) defines a bounded 
linear functional on W with dual norm < C. One can also say that X^e-E 
converges in the generalized sense with respect to the weak* topology on W* 
under these conditions. 

This estimate also implies that 



(32.2) 



xeB 



<C\\w\\ 



for every finite set B C E and w £ W, which is to say that 



(32.3) 



xeB 



< C 



for every finite set B C E. Thus / £ Y(E, W*), and 
(32.4) \\f\\Y{E,w*)<C. 

Conversely, if / £ Y(E, W*), then f(x)(w) is summable on E for every w £ W, 
with i 1 norm bounded by 2 H/Hy^y) in the real case and by 4 ||/||y(.E,y) in 
the complex case. If W is complete and f(x)(w) is summable on E for every 
w £ W , then one can use the uniform boundedness principle to conclude that 
f£Y(E,W*). 



33 Seminorms, 2 

Let V be a vector space over the real or complex numbers, and let iVi, JV2, . . . 
be a sequence of seminorms on V such that for each v £ V with jj ^ there is 
a positive integer j for which Nj(v) > 0. Under these conditions, one can check 
that 

(33.1) d(v,w) — ma,x{min(Nj(v — w), l/j) : j £ Z + } 

defines a metric on V, and that the topology on V determined by this metric is 
the same as the one associated to this sequence of seminorms as in Section [3T1 
Conversely, if the topology on V determined by a collection A/" of seminorms 
on V is metrizable, then it is Hausdorff, and there is a countable local base for 
the topology at 0. Using the latter, one can show that there is a sub-collection 
of M with only finitely or countably many elements that determines the same 
topology on V. 

Suppose now that V is equipped with a norm or p-norm and consider 
the weak topology on V. Suppose also that for each v £ V with » / there 
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is a A G V* such that X(v) ^ 0, which follows from the Hahn-Banach theorem 
when 1 1 1; 1 1 is a norm on V, and which implies that the weak topology on V 
is Hausdorff. Suppose in addition that V* is separable, and let Ai,A2,--. be 
a sequence of bounded linear functionals on V whose linear span is dense in 
V*. Let N\ 1 ,N\ 2 , ... be the seminorms on V corresponding to the Xj's as in 
Section[2Il Under these conditions, one can check that the topology induced on 
a bounded set in V by the weak topology is the same as the topology induced 
by the seminorms N\ 1 , N\ 2 , . . ., and hence is metrizable. 

Similarly, we can consider the weak* topology on the dual of a vector space V 
with a norm or p-norm. Suppose that V is separable, so that there is a sequence 
of vectors V\, wa, . . . in V whose linear span is dense in V. Let N* ± , N* 2 , ... be 
the seminorms on V* corresponding to the Vj's, as in Section [3TJ If if is a 
bounded set in V* with respect to the dual norm, then one can again check that 
the topology induced on K by the weak* topology is the same as the topology 
induced by the seminorms N* , N* 2 , . . ., and is therefore metrizable. 

Note that the unit ball 

(33.2) B* = {A G V* : ||A||* < 1} 

in the dual V* of V is closed with respect to the weak* topology. To see this, 
it is convenient to describe B* as the set of A G V* such that 

(33.3) \\{v)\ < 1 

for every v G V with ||i>|| < 1. The Banach-Alaoglu theorem states that B\ is 
actually compact with respect to the weak* topology. If V is separable, then 
the topology induced on B* by the weak* topology on V* is metrizable, as in 
the previous paragraph. In this case, compactness of -B* in the weak* topology 
is equivalent to sequential compactness. 



34 Isometric embeddings 

Let (M, d(x, y)) be a metric space. It is easy to check that 

(34.1) f p (x) = d(p,x) 

is a continuous function on M for each p G M, using the triangle inequality. If 
M is bounded, then f p is also a bounded function on M. Thus p4/ p defines 
a mapping from M into the space Ct,(M) of bounded continuous real- valued 
functions on M. Using the triangle inequality, one can show that this is an 
isometric embedding of M into Cb(M) with the supremum norm. 

If M is not bounded, then one can pick a basepoint po G M, and put 

(34.2) f P = f P -f Po . 

Using the triangle inequality again, one can check that f p is a bounded function 
on M for each p G M. Moreover, p H ► f p is an isometric embedding of M into 
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Cb(M) for the same reasons as before, since 



(34.3) U-U = fp - h 
for every p, q <G M. 

Suppose now that V is a real or complex vector space with a norm \\v\\, and 
let -B* be the closed unit ball in the dual space V*, as in (|33.2I) . Each v e V 
determines a bounded linear functional on V* defined by 

(34.4) L v {\) = \{v), 

which can also be considered as a bounded continuous function on B* with 
respect to the topology induced by the weak* topology. Thus v H> L v defines 
a linear mapping from V into the space C(B*) of continuous real or complex- 
valued functions on B{ with respect to the weak* topology, as appropriate. By 
the Banach-Alaoglu theorem, B^ is a compact Hausdorff space with respect to 
this topology. Using the Hahn-Banach theorem, it is easy to see that v h-> L v 
is also an isometry from V into C(-B*), with respect to the supremum norm on 
C{Bl). 



Part II 

Functions, measures, and paths 

35 Uniform boundedness, 3 

Let (X, A) be a measurable space, which is to say a set X with a cr-algebra A 
of measurable subsets of X, and let p be a nonnegative real- valued function on 
A. Suppose that for every sequence At, A2, . . . of pairwise-disjoint measurable 
subsets of X, 

00 00 

(35.1) p( |J Ajj < Y,p( A j) < 

J= i j=i 

This implies that p(0) = 0, by taking Aj — for each j. 

Let Bi, B2, ■ ■ ■ be a decreasing sequence of measurable subsets of X, so that 
Bj+i C £?j for each j, and put = (~}°^ 1 Bj. Thus Aj = Bj\Bj + \ is a 
sequence of pairwise-disjoint measurable subsets of X which are also disjoint 
from Boo, and 

00 

(35.2) B n = ( |J A,-) U Boo 

j=n 

for each n. In particular, Xlj^iP(^i) converges, which implies that p(B n ) is 
uniformly bounded in n, since 

00 

(35.3) p(B n ) < J2p( A j)+P{ b oc) 

j=n 
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for each n. If = 0, then (|35.3p implies that {p{B n )}^ =l converges to 0. If 
Ci, C2, • ■ • is an increasing sequence of measurable subsets of X, then a similar 
argument shows that p{C n ) is uniformly bounded in n, but we shall not need 
this here. 

If A C X is measurable, then put 

{00 
Y^p(Aj) : Ai,A 2 ,... are pairwise-disjoint 
j'=i 

00 -* 

measurable subsets of X such that A = A, > . 

j=i ^ 

We would like to show that p*(A) < 00 under these conditions. Equivalently, 
one can check that 

(35.5) P*(A) — sup < Vjp(Aj-) : A\, . . . ,A n are pairwise-disjoint 

n 

measurable subsets of X such that A = Aj > . 

i=i ^ 

More precisely, the second definition of p* (A) is clearly less than or equal to 
the first definition, because a partition of A into finitely many measurable sets 
can be extended to an infinite partition using the empty set. To show that 
the first definition of p*(A) is less than or equal to the second definition, one 
can approximate an infinite partition A%, A%, . . . of A by the finite partitions 
consisting of the sets A\, . . . , A n and Ujln+i A? f° r eacn n - 

If B\,B2, ■ ■ ■ is a sequence of pairwise-disjoint measurable subsets of X, then 

00 00 

(35.6) £>*(A)<P*(U S < 



1=1 1=1 

1 , , , • , • r I I OO 7-1 

I- 



because partitions of the B^s can be combined to get a partition of IJSi B 
Similarly, 

00 00 

(35.7) P*(U B -5>* (5 ' 



i=i ' 1=1 



because every measurable partition {Ej}°^L 1 of IJi^i &l can be refined to get 
a partition {Ej n -Bj}^ =1 which is a combination of partitions of the Bi's. 
Countable subadditivity implies that p(Ej) is less than or equal to the sum of 
p{E,j n B{) over I for each j, so that the sum of p(Ej) over j is less than or equal 
to the sum of p(Ej n Bi) over j and I. The sum of n Bi) over j is less than 
or equal to p*{Bi) for each I, and so the sum of p(Ej D Bi) over j and / is less 
than or equal to the sum of p*{Bi) over /, as desired. Therefore 

(35.8) P'(U B ')=&W 

i=i 1=1 



36 



which means that p* is countably additive. 

Suppose for the sake of a contradiction that p* (A) = oo for some measurable 
set A C X. This implies that there is a finite sequence of pairwise-disjoint 
measurable subsets A\^, . . . , Ai i7H of X such that 

«i 

(35.9) A = |J A ld 

i=i 

and 

(35.10) £>(^ij) > 1- 

i=i 

We also have that p*(Aij) = oo for some j, since 

(35.11) p*(A) =p*(A ltl ) + ---+p*(A hni ), 
and so we can relabel the indices, if necessary, to get that 

(35.12) p*(i4i, ni ) = oo. 

This permits us to repeat the process, to get a finite sequence A2,i, . . . , A2, n2 of 
pairwise-disjoint measurable subsets of X such that 

"2 

(35.13) A hni =J2 A 2J 

3 = 1 

and 

(35.14) > 2 - 

As before, p*(A2j) = oo for some j, and we can relabel the indices if necessary 
to get that p*(A2.n 2 ) = oo. Continuing in this way, we get a finite sequence 
Ak,i, . . . , Ak,n k of pairwise-disjoint measurable subsets of X for each positive 
integer k such that 

(35.15) |J A M = Afe_i )nfe _ 1 



when k> 2, 

(35.16) J>(4fc,0 > 

z=i 

and p*(A fei „J = oo. 
However, 

oo n k — 1 

(35.17) Yl E p( A *.o < °°> 

fe=l J=l 
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because the Akjs are pairwise disjoint when I < n^. Hence the sums 

nts— l 

(35.18) Y, 

1=1 

are uniformly bounded in k, and even converge to as fc — > oo. By construction, 
Ak+x,n k+1 — Ak,n k for each k, and so p(Au, nk ) is also uniformly bounded in k, 
as mentioned earlier in the section. This implies that the sums 

n k n k - 1 

(35.19) X>(4b«) = E P( A ^) +P( A k,n k ) 
1=1 1=1 

are uniformly bounded in k as well. This contradicts (|35.16[) . and we conclude 
that p*(A) < oo for every measurable set A C X. 
Of course, 

(35.20) p(A)<p*(A) 

for every measurable set A C X, and in fact p* is the smallest countably- additive 
measure with this property. More precisely, if p is a countably-additive measure 
such that p(A) < p(A) for every measurable set A C X, then p*(A) < p(A) 
for each A. This follows directly from the definition of p*{A). Observe too 
that the hypothesis that Yl'jLi P(Aj) converges when Ax, A%, ■ ■ • is a sequence of 
pairwise-disjoint measurable sets is necessary in order to have a finite measure 
p such that p(A) < p(A). 



36 Real and complex measures 

Let (X, A) be a measurable space, and let p be a real or complex measure on 
this space. This means that p is a real or complex-valued function on A such 
that 

oo oo 

(36.1) A*(U4f)=X>(^) 

3=1 3=1 

for every sequence Ai, A2, . . . of pairwise-disjoint measurable subsets of X. More 
precisely, the convergence of the series YlJLx MA?) ^ s P ar ^ °f * ne definition. It 
follows that the series converges absolutely, because every rearrangement of the 
series is of the same type. Note that /x(0) = is also implied by the definition, 
by taking Aj = 0. If p(A) — \p(A)\, then it is easy to see that p(A) satisfies 
the conditions described in the previous section. Hence p*{A) is a countably- 
additive finite measure, which is commonly denoted |/i|(A). 

In the real case, p, is also known as a signed measure on X, and it is easy to 
see that 

+(A) \p\(A) + p(A) _ \p\(A)-p(A) 

(36.2) p^(A) = , p = 

are finite nonnegative measures on X . Note that 

(36.3) p{A)=p+(A)~p-(A) 
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and 

(36.4) \n\(A)=n+(A) + n-(A) 

for each measurable set A Q X. Similarly, if fi is a complex measure on X, then 
\x can be expressed as a linear combination of finite nonnegative measures on 
X, by applying this argument to the real and imaginary parts of fi. 

There are a number of simplifications that can be made in the previous 
section when p(A) = \fi(A)\ for a real measure pi on X. The first simplification 
is to replace the earlier definition of p* (A) with 

(36.5) p*(A) = sup{\pi(B)\ + \pi(C)\ : B,C E A, i = BUC, BC\C = %}. 

The right side is clearly less than or equal to the earlier definition of p*(A). To 
show the opposite inequality, let {Aj}j° =1 be any sequence of pairwise-disjoint 
measurable subsets of X such that A = [JJLi Aj- If B is the union of the A^'s 
with n(Aj) > and C is the union of the A^s with /x(vL-) < 0, then A = BUC, 
B n C = 0, and 

OO 

(36.6) J2 IM4f)l = ^B) - /.(C) = \fi{B)\ + \n(C)\. 
i=i 

This implies that the earlier definition of (A) is less than or equal to the right 
side of p6.5p . by taking the supremum over all such sequences {Aj}°Z 1 . In the 
same way, we also have that 

(36.7) p*(A) = sup{/i(B) - fi(C) : B,C € A, A = BUC, BnC = 0}. 

This makes it much easier to show that p*(A) < oo. If p*(A) = oo for some 
measurable set A C X, then there are disjoint measurable sets B, C such that 
A = B U C and n(B) — /i(C) is as large as we want. Of course, 

(36.8) n(A) = n(B) + n(C), 

which implies that both \n(B)\ and |a*(C)| are as large as we want. Because p* 
is subadditive, we also have that p*{B) = oo or p*(C) = oo. Put A\ = B if 
p*(B) — oo, and otherwise A\ — C. Repeating the process, we get a decreasing 
sequence {Aj}°Z 1 of measurable subsets of X such that p*(Ai) = oo for each I 
and — > oo as Z -> oo. This contradicts the fact that is bounded 

when Ai + i C Ai for each I, as in the previous section. One can also use the fact 
that {/j,(A)j}j^ 1 converges under these conditions, and hence is bounded, which 
is based on a similar argument. It follows that p*(A) < oo when p(A) = \p,(A)\ 
for a complex measure by considering the real and imaginary parts of /x. 
In the real case, we can combine (136.7[) and (|36.8j) to get that 

(36.9) n + {A) = sup{/x(S) : B e A, B C A}. 

We may restrict our attention to B C A such that n{B) > here, since B = 
has these properties. Similarly, 

(36.10) ft- (A) = sup{- fi(C) : C G A, C C A}. 



39 



If /ii is a nonnegative real measure on X such that fJ-(A) < /xi(^4) for every 
measurable set A C X, then 

(36.11) fi + {A)<^(A) 

for every 4e A More precisely, this uses the fact that 

(36.12) fn(A) = mi (B) + tn(A\B) > ^(B) 

when B C A, because ni(A\B) > 0. Similarly, if [i-i is a nonnegative real 
measure on X such that jti(A) > —/12(A) for every measurable set A C X, then 

(36.13) ^-(A) < n 2 (A) 

for every A G .4. Of course, /ii = fi + and /12 = jU — have these properties, by 
construction. 

If /Hi, /i2 are finite nonnegative real measures on X such that 

(36.14) /Lt(v4) = fii (A) — [ii (A) 
for every measurable set A C X, then 

(36.15) - /12(A) < fi(A) < fi^A) 

for every A £ A. Thus /i\ and /x 2 satisfy (136. lip and (|36.13[) , respectively, as in 
the preceding paragraph. As before, /i\ = /12 = H~ have this property, by 
construction. 

Suppose that P, Q are disjoint measurable subsets of X such that PUQ = X 
and 

(36.16) MP0 = /x(P)-/i(Q). 

This is the same as saying that the supremum in (|36.7[) is attained when A = X , 
with B = P and C = Q. If E is a measurable subset of P such that fi(E) < 0, 
then 

(36.17) M (P) = /i(P\£) + M (£) < KP\E) 
and 

(36.18) m(Q)>MQ) + M-E)=MQU£), 
which implies that 

(36.19) n(P)-ii(Q)<n(P\E)-n(Q\J&), 

contradicting maximality. Thus fi(E) > for every measurable set E C P, and 
similarly //(£■) < for every measurable set E C Q. Using this, one can check 
that 

(36.20) n + (A) = fi(AnP), n~(A) = (j,(AnQ) 

for every measurable set AC1, which is to say that the suprema in (|36.9[) and 
(|36.10j) are attained with B = A n P and C = A n Q. 
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The Hahn decomposition theorem states that there are disjoint measurable 
subsets P, Q of X such that PU Q = X and (|36.20[) holds for every measurable 
set A C X. One way to prove this is to show that the supremum in (|36.7p is 
attained when A = X, as in the next paragraph. Another way is to use the 
Radon-Nikodym theorem, discussed in Section [551 

Suppose that {Bj}j? =1 , {Cj}'^L l are sequences of measurable subsets of X 
such that Bj fl Cj = and Bj U Cj = X for each j, and 

(36.21) lim(n(B j )-n(C j )) = \ t i\(X). 

j->oo 

Observe that 

(36.22) |/x|(X) - (KB 3 ) - KCj)) = 2 (jrfo) + M +(Q)) 
for each j, because l^lpf) = \n\(Bj) + |/x|(Cj). Hence 

(36.23) lim fi~(Bj) = lim fi + (Cj) = 0. 

j—toc j — > oo 

Using this, one can show that {-B^}™^, {Cj}^ 1 are Cauchy sequences with 
respect to the semimetric on A associated to \fi\ as in Section [7^1 and hence 
converge. This is equivalent to saying that the sequences of their indicator 
functions are Cauchy sequences in ^(X, \fi\), and hence converge in L X (X, 
to indicator functions of measurable subsets of X. More precisely, (|36.23[) im- 
plies that {Bj}°^ 1 converges to the empty set with respect to and that 
{Cj}j2. 1 converges to the empty set with respect to This implies in turn 
that {Bj}J!L 1 converges to X with respect to and that {Cj}J^ 1 converges to 
X with respect to fi~~ , because Cj — X\Bj for each j. It follows that {Bj}^ l7 
{Cj}°^ 1 are Cauchy sequences with respect to both ^ + and and are thus 
Cauchy sequences with respect to = fi + + (i~ . The limits of these sequences 
correspond to measurable subsets P, Q of X that are determined up to sets of 
|/i|-measure 0. By construction, Q is the same as X\P up to a set of |/i|-measure 
0, and we may as well take Q = X\P. We also have that n~(P) = M + (Q) = 0, 
|/i|(A) = fi(P) — /i(Q), and so on. 



37 Vector- valued measures 

Let (X, A) be a measurable space, and let V be a real or complex vector space 
with a norm ||w||. More precisely, suppose that V is a Banach space, which 
means that V is complete with respect to the metric associated to the norm. 
Let /x be a y-valued function on A such that 

oo oo 

(37.1) m(|JV)=$>(^) 

for every sequence A\,A2,... of pairwise-disjoint measurable subsets of X. 
Again convergence of the sum 

oo 

(37.2) 5>(^i) 
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is part of the hypothesis, which implies convergence of rearrangements of the 
sum. However, in this case, absolute convergence 

oo 

(37.3) £>(A,-)|| <^ 

j=i 

is an additional condition. If we have absolute convergence, then p(A) = \\/j,(A)\\ 
satisfies the requirements of Section [35] This implies that = p*(A) is a 

countably-additive finite nonnegative measure. 

Let v be a countably-additive finite nonnegative measure on (X, A)-, and 
take V to be L q (X, v) for some q, 1 < q < oo. Also let 1a(x) be the indicator 
function of A C X, equal to 1 when x £ A and to when x € X\A. If 
fi(A) — 1a for each measurable set A C. X, then fi is a I^-valued function 
on „4 that satisfies the countable additivity condition described in the previous 
paragraph. If q = 1, then /i also satisfies the absolute convergence condition. 
This does not normally work when q > 1, even when is Lebesgue measure on 
the unit interval. 

Let /i be an arbitrary ^-valued function /i on A that satisfies the countable 
additivity condition mentioned at the beginning of the section, not necessarily 
with absolute convergence. If A is a bounded linear functional on V, then 

(37.4) f i x (A) = X(fi(A)) 

defines a real or complex measure on (X, A), as appropriate. In particular, 
has finite total variation \n\\, and 

(37.5) MA)| < \^\(A) < \fx x \(X) 
for every measurable set A C X. Thus 

(37.6) {A(/*CA)) -AeA} 

is a bounded set of real or complex numbers, as appropriate, for each A £ V*. 
It follows that 

(37.7) {[m(A) : A e A} 

is a bounded set in V, as in Section I2U1 
If a is a real measure on (X,A), then 

(37.8) \a\{X) <2sup{\a(A)\: AeA}, 

because of (|36.5p . Similarly, if (3 is a complex measure on (X, A) , then 

(37.9) |/3|(X)<4sup{|/?(A)| :AeA}, 

by applying (|37.8[) to the real and imaginary parts of (3. If /i is a countably- 
additive V- valued function on A and A is a bounded linear functional on V, as 
in the previous paragraph, then 

(37.10) |/i A (^)| = |A(M(A))|<||A||*|| M (A)|| 
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for every measurable set A C X. Hence 

(37.11) |ma|(^) < 2 ||A||* sup{||//(A)|| : A e A} 
in the real case, and 

(37.12) |/x A |(X) < 4 ||A|U sup{||/i(A)|| 

in the complex case. 

If /i is a countably-additive V- valued function on .4 and B\ , i?2 , • ■ • is an 
increasing sequence of measurable subsets of X, then 

oo 

(37.13) .Urn ^1=^4 

3=1 

This follows from countable additivity by taking Ai = £?i and Aj = Bj\Bj_i 
when j > 2, as usual. Conversely, this continuity condition implies countable 
additivity when fi is finitely additive, by taking B n = Uj=i Similarly, if 
Ci, C2, . . . is a decreasing sequence of measurable subsets of X, then 

00 

(37.14) lim p(C t ) =J C\Ci). 

1=1 

This is equivalent to (|37.13l) when fi is finitely additive, with Bj — X\Cj. 

Let us use these continuity conditions to give another proof of the fact that 
\i is bounded, like the one for real measures in the previous section. Put 

(37.15) J1(A) = sup{\\n(B)\\ : B e A, B C A} 

for each measurable set ACI, which may be +00 a priori. Observe that 

(37.16) fi(A U A') < $(A) + fi(A') 

for any measurable sets A, A' C X. This is because any measurable subset B 
of A U A' can be expressed as the union of B n A C A and £>\A C A', which are 
automatically disjoint. Thus 11(B) is the sum of [i(B n A) and /i(B\A), so that 
||/i(i?)|| is less than or equal to the sum of ||/x(Bn A)\\ and which is 

less than or equal to the sum of fi(A) and Ji(A'), as desired. 

Suppose for the sake of a contradiction that 'jl(A) = +00 for some measurable 
set A C X. Hence there are measurable sets B C A such that is as large 

as we want. Because n(A) is equal to the sum of 11(B) and fi(A\B), it follows 
that ||/x(f?)|| and ||/z(A\B)|| can both be as large as we want at the same time. 
Using the finite subadditivity of fl discussed in the previous paragraph, we get 
that 12(B) — +00 or fi(A\B) = +00. By taking C\ = B or A\B, as appropriate, 
we get a measurable subset of A such that C\ = +00 and ||^(Ci)|j is as large as 
we like. Repeating the process, we get a decreasing sequence of measurable sets 
Cx, C2, ■ ■ ■ such that fl(Ci) = +00 for each I > 1 and ||/^(C;)|| — > 00 as I —> 00. 
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This contradicts the fact that {/i(C , ;)} i °^ 1 converges in V to ^(n^i Ci)> by the 
continuity condition that follows from countable additivity. 

Let E be a nonempty set, and let /(x) be a V"- valued function on E such 
that X^e.E f( x ) converges in the generalized sense. In particular, ^2 xeE f{%) 
satisfies the generalized Cauchy condition, and so for each e > there is a finite 
set B e dE such that 



(37.17) 



E /(*) 



< e 



for every nonempty finite set C C X\B e . It follows that Xze.4 sa tsfies the 
generalized Cauchy condition for every nonempty set A C X, since we can use 
AO B e in place of i? e for the sum over A. Hence ^2 xeA f(x) converges in the 
generalized sense for every nonempty set AQ E, because V is complete. Put 

(37.18) ^) = £/(*) 

xeA 

for each A C E, which is interpreted as being when A — 0. It is easy to see 
that this is a finitcly-additivc V- valued measure on the algebra of all subsets of 
E. Note that 

(37.19) MC)\\ < e 

for every C C X\B t , since we can reduce to the previous case by approximating 
C by finite sets. Using this, one can check that /x is countably-additive. If 
||/(x)|| is a summable function on E, then it satisfies the additional absolute 
convergence condition mentioned at the beginning of the section. 



38 The Radon— Nikodym theorem 

Let (X, A) be a measurable space, and let fi, v be a finite nonnegative measures 
on (X, A) such that 

(38.1) fj,(A)<Cu(A) 

for some C > and every measurable set A C X. A special case of the 
Radon-Nikodym theorem states that there is a bounded nonnegative measurable 
function h on X such that 

(38.2) n(A) = [ hdv 

J A 

for every measurable set A C X . Von Neumann's trick for showing this is to 
observe first that 

(38.3) A(/)= / fdfi 

is a bounded linear functional on L 2 (v). More precisely, 

(38.4) |A(/)| <J x \f\dn<cJ x \f\du < Cv{Xf' 2 (J |/| 2 ^) 1/2 , 
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using our hypothesis on fi and u in the second step, and the Cauchy-Schwarz 
inequality in the third step. Because L 2 (X, u) is a Hilbert space, the Riesz 
representation theorem implies that there is an h £ L 2 (X, u) such that 



(38.5) A(/) = / fhdu 

Jx 

for every / £ L 2 (X, u). Hence 

(38.6) n(A) = A(1 A ) = / hdu 

J A 

for every measurable set A C X . It follows that 

(38.7) h{x) < C 

almost everywhere on X with respect to u under these conditions. 

Instead of (|38.1|) . suppose now that n(A) = for every measurable set 
A C X such that u(A) = 0, In this case, fi is said to be absolutely continuous 
with respect to v, denoted /i <C v. The Radon-Nikodym theorem states that 
there is then a nonnegative measurable function h on X such that (|38.2p holds 
for every measurable set A C X . More precisely, h is also integrable with respect 
to v, because 

(38.8) / hdu = n(X) < oo. 

Jx 

To see this, we apply the previous version to /i and ui = fi + u, since 

(38.9) /i{A) < fi(A) + u(A) = Vl {A) 

for every measurable set A C X trivially. This leads to a real- valued measurable 
function hi on X such that < hi < 1 and 

(38.10) n(A) = [ h x dux 

J A 

for every measurable set A. If 

(38.11) B = {x £ X : hi(x) = 1}, 
then B is measurable, and 

(38.12) ( i(B) = u 1 (B)=fx(B)+u(B), 

which implies that u(B) = 0, and hence /i(B) = 0. Thus h\ < 1 ^-almost 
everywhere, and one may as well take h\ so that < hi < 1 everywhere on X . 
If A C X is measurable, then 

(38.13) n{A)= / / h x du 

J A J A 
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implies that 
(38.14) 




and one can show that (|38.2[) holds with h = hi /(l — hi). More precisely, 



for every bounded measurable function g on X, because of (|38.14[) . If hi < 1 — 5 
on A for some (5 > 0, then one can take g — 1/(1 — hi) on A, g = Q on X\A, 
to get (|38.2p . One can then use countable additivity to get (I38.2j) for arbitrary 
measurable sets A. 

If /x is a real or complex measure on (X, A), and not necessarily positive, 
then jj, is still said to be absolutely continuous with respect to v when (i(A) = 
for every measurable set A C X such that v(A) — 0. This is equivalent to 
the condition that the total variation measure be absolutely continuous with 
respect to v, which implies that /i can be expressed as a linear combination of 
finite nonnegative measures on X that are absolutely continuous with respect 
to v. It follows from the previous case that there is a real or complex-valued 
integrable function h on X with respect to v for which (|38.2p holds. One can 
also allow v to be cr-finite, by decomposing the domain into a countable union 
of pairwise-disjoint measurable sets of finite i/-measure. It is better to do this 
first when fi is nonnegative, to get the integrability of the density h, and then 
deal with real or complex measures \x. 

Note that h is determined z/-almost everywhere by fi. More precisely, if h is 
a real or complex-valued integrable function on X with respect to v such that 



for every measurable set A C X, then h(x) = for almost every x £ X with 
respect to v. In the real case, one can simply take A to be the set where h(x) > 
or h(x) < 0. The complex case follows from the real case, by considering the 
real and imaginary parts of h separately. If h', h" are integrable functions on 
X with respect to v such that 



for every measurable set A C X, then it follows that h = hi — h" is equal to 
almost everywhere on X with respect to v. 

Of course, any real or complex measure fi on X is absolutely continuous with 
respect to the corresponding total variation measure The Radon-Nikodym 
theorem implies that there is an integrable function h on X with respect to \fx\ 
such that 



(38.15) 




(38.16) 




(38.17) 




(38.18) 
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for every measurable set A C X. Clearly 

(38.19) | M (A)| < f \h\d\n\ 

J A 

for every measurable set A C X, which implies that 

(38.20) |p|(A)< f \h\d\ii\, 

J A 

since the right side is a nonnegative measure on X. It follows that \h(x)\ > 1 
for almost every x £ X with respect to |/x|, and we would like to check that 
| ft (a:) | = 1 almost everywhere on X . 

If // is real and A\ — {x £ X : h{x) > 1} has positive |/z|-measure, then 

(38.21) »{Ai)= ( hd\ii\>\n\{A 1 )>n{A l ), 

J A! 

a contradiction. Thus = 0, and |/x|({x € X : h(x) < —1}) = for 

similar reasons. In the complex case, put A a — {x £ X : Ke(ah(x)) > 1} for 
each a £ C with \a\ = 1. If |/x|(A Q ) > for some a, then 

(38.22) \[i(A a )\ >Re(a/x(A a )) = / Re(aft) d|/x| > HC^a) > |a»(A«)|, 

which is a contradiction again. This shows that |/i|(A Q ) = for every complex 
number a with |a| = 1. Let {<Xj}°°^i be a sequence of complex numbers with 
| ay | = 1 for each j which is dense in the unit circle in C, such as an enumeration 
of the points on the circle that correspond to angles that are rational multiples 
of 2ir. If x £ X and |ft(x)| > 1, then x £ A aj when ctj is sufficiently close to 
ft(x)/|ft(x)|. Equivalently, 

oo 

(38.23) {x £ X : \h(x)\ >l}=\jA aj , 

j'=i 

and so |/i|({a; £ X : \h(x)\ > 1}) = 0, as desired. In particular, h{x) = ±1 
almost everywhere on X with respect to in the real case, which implies the 
Hahn decomposition, as in Section 1361 

39 The Lebesgue decomposition 

Let (X, ^4) be a measurable space, and let /i and v be positive finite measures 
on X. \iv\ = /Lt + z/, then ji < 1/%, and there is a real- valued measurable function 
hi on X that satisfies < fti < 1 and (|38.10[) , as before. Let B be as in (|38.11|) . 
so that B is measurable and satisfies (|38.12[) . which implies that v(B) = 0. 
However, without the additional hypothesis of absolute continuity of // with 
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respect to v, we do not necessarily have that n(B) = 0. Instead, let //, /i" be 
the measures defined by 

(39.1) n\A) = n{Ar\B), fi"{A) = n{An(X\B)), 

By construction, p! and v are mutually singular, in the sense that v{B) = and 
H'(X\B) = 0. We still have (J3SH3}, (|3XTg]l . and ([5535]) , which imply that 

(39.2) H"(A)= [ T^j-dv 

JAn(x\B) 1 — "l 

for every measurable set A C X. In particular, /i" is absolutely continuous 
with respect to v. Of course, fi = fi' + /x", which is known as the Lebesgue 
decomposition of /i. If /i is a real or complex measure on X, then an analogous 
decomposition can be obtained by applying this argument to \fi\ in place of /i. 

40 The Riesz representation theorem 

Let (X, A, fj) be a measure space, and let 1 < p, q < oo be conjugate exponents, 
so that l/p + 1/q = 1. If / G L P (X) and g € L q (X), then the integral version 
of Holder's inequality implies that / j £ L 1 (X), and that 

(40.1) ll/slk < ll/IUMU- 

The proof is basically the same as for sums, as in Section [501 It follows that 

(40.2) X g (f) = f fgdfi 

defines a bounded linear functional on L P (X) when g £ L q (X), with dual norm 
less than or equal to \\g\\ q . If p = oo, then it is easy to see that the dual norm of 
X g is equal to \\g\\i, by choosing / £ L°°(X) such that ||/||oo = 1 and / g = \g\- 
Similarly, if 1 < p < oo, then the dual norm of X g on L P (X) is equal to 
because there is an / e L P (X) such that f g = \f\ p — \g\ q . The dual norm of X g 
on ^(X) is also equal to ||g||oo, under an additional hypothesis. More precisely, 
we should ask that for each measurable set A C X with n(A) > there is a 
measurable set B C A such that < n(B) < oo. This condition holds when /i 
is a- finite on X, and for counting measure on any set X. If0<£<|!s||oo, then 
we can apply this to A t = {x e X : |<7(a;)| > t} to get a measurable set B t C A t 
with < fJ.(B t ) < oo. Put ft{x) — g(x)/\g(x)\ for every x £ B t when g is 
real- valued, ft(x) = g(x) /\g(x)\ for every x £ B t when g is complex-valued, and 
ft{x) = for every x £ X\B t in both cases. It is easy to see that ft € L 1 (X), 
ll/tlli = M-^O' ano - \(ft) — which implies that the dual norm of X g 

on L X (X) is greater than or equal to t. It follows that the dual norm of X g on 
L 1 (X) is greater than or equal to ||g||oo, since this holds for every nonnegative 
real number t such that t < |jg||oo- Hence the dual norm of X g on L X (X) is 
equal to ||<?||oo, since we already know that it is less than or equal to ||g||oo. 
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Conversely, every bounded linear functional on L P (X) can be realized in this 
way when 1 < p < oo, and also when p = 1 and X has a- finite /^-measure. To 
see this, let us begin with the case where fJ,(X) < oo. Let A be a bounded linear 
functional on L P (X), 1 < p < oo, and put 

(40.3) u{A) = \{1 A ) 

for every measurable set A C X . Here 1a denotes the indicator function on X 
associated to A, equal to 1 on A and to on X\A. E Ai, A 2 , . . . is a sequence of 
pairwise-disjoint measurable subsets of X, then Y^jLi ^-A, converges in L P {X) 
to the indicator function associated to \JJL 1 Aj when p < oo, and hence 

oo oo 

(40.4) "(LMi)=I>^i)- 

Thus v is a real or complex measure on X, as appropriate. This measure is also 
absolutely continuous with respect to fi, since 1a = in L P (X) when fx{A) = 0. 
The Radon-Nikodym theorem implies that there is a g € -^(AT) such that 

(40.5) u{A)= f gdfi 

J A 

for every measurable set AC X. By linearity, it follows that 

(40.6) A(/)= f fgdu 

J x 

for every measurable simple function / on X. This also holds when / is a 
bounded measurable function on X , by approximating / by simple functions. If 
p = 1, then one can use this to show that g G L°°(X), with L°° norm less than 
or equal to the dual norm of A on L l {X), in the same way as in the previous 
paragraph. If p > 1, then one can first show that the L q norm of the restriction 
of g to any set on which it is bounded is less than or equal to the dual norm of 
A on L P (X), by the same type of argument as in the previous paragraph. This 
implies that g E L q (X), with L q norm less than or equal to the dual norm of A 
on L P (X). In both cases, one can then use the boundedness of A on L P (X) and 
the fact that that g € L q {X) to show that (|4TO|) holds for every / € L P {X), 
because simple functions are dense in L P {X). 

Suppose now that X has er-finite /^-measure, so that there is a sequence of 
measurable subsets Ex,Ez,... of X such that n(Ei) < oo for each / > 1 and 
Uti Ei = X. We may also suppose that Ek H E\ = when k ^ I, by replacing 
Ei with Ei\(E\ U • • • Ei-i) when I > 1. If A is a bounded linear functional on 
L P (X), then the restriction of A to / € L P (X) such that / = on X\Ei defines 
a bounded linear functional on L P (E{) for each I. By the previous argument, 
for each positive integer I, there is a gi <E L q (Ei) such that 

(40.7) A(/)= / fgidn 

J Ei 
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for every / e L p (Ei). Let g be the function on X defined by g — gi on Ei for 
each I. Thus the restriction of g to Uj=i -^z i s m L q for each n, and A(/) is equal 

to the integral of / times g when / e L P (X) and / = on -XA^UILi ^i) ■ ^ n 
particular, the L q norm of the restriction of g to UzLi £7; is less than or equal 
to the dual norm of the restriction of A to £ p ^U"=i ^z) f° r cacn n > which is 
bounded by the dual norm of A on L P (X). This implies that g € L q (X), with L 9 
norm less than or equal to the dual norm of A on L P (X). Every / e L P (X) can be 

approximated in the L p norm by functions that are equal to on X\ ^ Uz=i ^) 

for some n, because q < oo, and so A(/) is given by the integral of / times g for 
every / e £ p pf). 

If 1 < p < oo, then we can drop the hypothesis that X be cr-finite. To 
see this, let a bounded linear functional A on L P (X) be given. We may as 
well suppose that A ^ 0, since otherwise there is nothing to do. In particular, 
L P (X) 7^ {0}, which is to say that there are measurable subsets of X with 
positive hnite measure. If Y C X is measurable and cr-finite, then there is a 
g Y € L q (Y) such that 

(40.8) Hf) = J fgrdu 

for every / e with / = 0on X\Y, by the previous argument. Moreover, 

the L q norm of gy is equal to the dual norm of the restriction of A to L P (Y), 
which is less than or equal to the dual norm of A on L P (X). Let /i, / 2 , ... be a 
sequence of elements of L P (X) such that \\fj\\ p = 1 for each j and {\^{fj)\}j^i 
converges to the dual norm of A on L P (X). Observe that 

oo 

(40.9) Y Q =\J{xeX :f J (x)^0} 

3 = 1 

is a measurable set with cr-finite measure, because the set where fj ^ has 
this property for each j. Hence there is a gy <G £ 9 (Y"o) with the properties 
mentioned earlier. By construction, the dual norm of A on L P (X) is equal to 
the dual norm of the restriction of A to L P (Y ), which is equal to the L q norm of 
gy . If Y C X is measurable and cr-finite, and if Y a C Y, then g Y = g Yo almost 
everywhere on Y 0} by uniqueness of the representation. However, the L q norm 
of gy is less than or equal to the dual of norm of A on L P (X), which is equal 
to the L q norm of gy . This implies that gy — almost everywhere on Y\Yq, 
since q < oo. Let g be the function on X equal to gy on Yq and to on X\Yo. 
If / € L P (X), then the previous argument can be applied to 

(40.10) Y = Y Q U {x e X : f(x) £ 0}, 

to get that A(/) is equal to the integral of / times g, as desired. 
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41 Lengths of paths 



Let (M, d(x, y)) be a metric space, and let / be a function on a closed interval 
[a, b] in the real line with values in M. If V — {tj-}™_ is a partition of [a, 6], in 
the sense that 

(41.1) a = t < h < ■ ■ ■ < t n = b, 
then we put 

n 

(41.2) A b a (V) = J2d(f(tj)J(tj-i)). 

3=1 

Note that 

(41.3) d(f(a),f(b))<A» a (V), 
because of the triangle inequality. Similarly, 

(41.4) A b a (V) < A b a (V) 

when V' is another partition of [a,b] that is a refinement of V, which means 
that V includes the points in V . The length of the path f(t), a < t < b, is 
defined to be the supremum of A* (P) over all partitions V of [a, b], which may 
be infinite. 

Suppose that a < r < b, and that Vi, V% are partitions of [a,r], [r,b], 
respectively. We can combine V\, Vi to get a partition V of [a, b] that satisfies 

(41-5) A r a (V 1 )+A b (V 2 )=A b a (V). 

Thus 

(41-6) A^) + A b r (V 2 ) < A b a , 

which implies that 

(41.7) A^ + A^<A^, 

by taking the supremum over all partitions Vi, V 2 of [a,r], [r,b]. In the other 
direction, if V is any partition of [a, b], then V may or may not include r, but 
we can add r to V if necessary to get a refinement V of V that does contain 
r. This permits V' to be expressed as the combination of partitions V\, V 2 of 
[a,r], [r,b], respectively, so that 

(41.8) A b a (V) < A b a (V) = A r a {Vi) + A b (V 2 ). 
Hence 

(41.9) A b a (V)<A: + A b . 
for every partition V of [a, b], and therefore 

(41.10) A^<A^ + A 6 r . 
Combining this with (|41.7p . we get that 

(41.11) A^ = A; + A,^. 
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In particular, 

(41.12) A r Q < A* 

when a < r < b, which can be seen more directly by extending any partition of 
[a, r] to a partition of [a, b}. 

The diameter of a nonempty set E C M is defined by 

(41.13) diam E = sup{d(x, y) : x,y G E}, 

which is finite exactly when E is bounded. If a < r < t < b and V is a partition 
of [a, b] consisting of these points, then 

(41-14) d(f(r),f(t))<A b a (V)<A b a . 

It follows that 

(41.15) diam /([a, b}) < A b a . 

Note that A^ = if and only if / is constant. 

Consider the special case where M = R and / : [a, b] — > R is monotone 
increasing. If V = {tj}" =0 is any partition of [a, 6], then 

n 

(41.16) A£(7>) = " /fe-i)) = - /(«)■ 
This implies that 

(41.17) A^ = /(6) - f(a). 

42 Lipschitz mappings 

Let {Mi,d\(x, y)) and (M 2 , ^(m, v)) be metric spaces. A mapping / : M\ — > M 2 
is said to be Lipschitz if there is a constant k > such that 

(42.1) d 2 {f{x)J(y))<kd x (x,y) 

for every a;, y € Mi. Thus Lipschitz mappings are automatically uniformly 
continuous, and / is Lipschitz with k = if and only if / is constant. 

If Mi is the real line with the standard metric, then / : M\ — > R is Lipschitz 
with constant k if and only if 

(42.2) f(x)<f(y) + kd 1 (x,y) 

for every x,y £ M\. More precisely, (|42.1[) implies (|42.2[) directly, and to get 
the converse, one can apply the latter both to x, y and with the roles of x, y 
exchanged. In particular, 

(42.3) " f p (x)=d 1 (x,p) 

is Lipschitz with constant 1 on M% for every p £ M\, by the triangle inequality. 
For example, f(x) — \x\ is Lipschitz with constant 1 on the real line. 
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Suppose now that / is a Lipschitz mapping with constant k from a closed 
interval [a, b] in the real line with the standard metric into a metric space 
(M2,d,2(u,v)). If V = {ij}™ = o is a partition of [a, 6], then 

n n 

(42.4) A b a (V) = ]T d2(/(*i), < E fc (*i - = fc ( 6 - 

Thus / has length A b a <k(b-a). 

If Mi, M 2 , and M 3 are metric spaces, and fx : Mi — > M 2 , /a : M 2 — » M 3 are 
Lipschitz mappings with constants k\, fc 2 , respectively, then their composition 
/2 o/i is a Lipschitz mapping from Mi into M 2 with constant k\ fc 2 . Similarly, 
if fx : [a,b] —> M 2 has length and / 2 : M 2 — > M3 is Lipschitz with constant 
fc 2 , then / 2 o fx : [a, b] — > M 3 has length < fc 2 A* . 



43 Bounded variation 

A real- valued function / on a closed interval [a, b] in the real line is said to have 
bounded variation if it has finite length as a mapping into R with the standard 
metric. In this case, the length of / is also known as its total variation. We can 
also consider the positive and negative variations of / separately, as follows. 

For each real number x, put x+ — x when x > 0, x+ — when x < 0, 
X- — —x when x < 0, and £_ = when x > 0. Thus 

(43.1) x + + X- = \x\, x + — X- = x 
and 

(43.2) {x + y)+ <x + +y + , (x + y)_ < x_ + y_ 

for every x, y € R. If V = {tj}™ =0 is a partition of [a, b], then put 

(43.3) ^) = E(/(<i)-/fe-i)) + 
and 

(43.4) =£(/&)- 
Note that 

(43.5) P a b (7>) + iV Q fc (7>)=A^) 
and 

(43.6) P b (V)-N b (V) = f(b)-f(a), 

by (|43.1[) . If T 5 ' is another partition of [0,6] which is a refinement of V, then it 
is easy to see that 

(43.7) P b (V) < P b (V), N b (V) < N b (V), 
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using (g33). 

Let P b , ^ be the suprema of P b (V), N b (T) over all partitions V of [a, b], 
respectively. Clearly 

(43.8) < P b a + N b , 
by (|43.5[) . To get the opposite inequality 

(43.9) P h a +N b a < A b a , 

one should be a bit more careful, because the partitions V of [a, b] for which 
pk(-p) approaches P b may not be the same as the partitions for which N^(V) 
approaches N b . However, using common refinements of such partitions, one can 
get partitions V such that P b (V), N b (V) approach P b , N b at the same time. 
This implies (|43.9[) . from which it follows that 

(43.10) P b a + N b = A b a . 
Observe also that 

(43.11) P r a +P b = P b , N:+N b =N b 

for each r, a < r < b. This uses the same arguments as for A^, in Section |4"T1 

Suppose now that / has bounded variation, so that A b a < oo, and hence 
P b ,N b < oo. Using (I45T6)) . one can check that 

(43.12) P b - N b = f(b) - f(a). 

More precisely, one should be careful to use partitions V of [a, b) such that P b , 
N b are simultaneously approximated by P^V), N^V), respectively, as in the 
previous paragraph. Similarly, 

(43.13) P r a -Nl = f(r) - f(a) 

for each r G [a, b], since the restriction of / to [a, r] also has bounded variation. 
Of course, P„ and N£ are monotone increasing on [a, 6]. 



44 Functions and measures 

Let a(x) be a monotone increasing real-valued function on the real line. As 
usual, the one-sided limits a(x+) — lim y ^ x + a(y), a(x-) = lim z ^j;_ a(z) exist 
for every x £ R, and are given by 

(44.1) a(x+) = sup{a(y) : y E R, y < x}, 

(44.2) a(x— ) = inf{a(z) : z e R, x < z}. 

Thus 

(44.3) a(x-) < a(x) < a(x+) 

for every x G R, and a(x+) = a{x—) exactly when a is continuous at x. 
Moreover, 

(44.4) a{x+) < a(y-) 
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for every x, y € R with x < y. Remember that the set of x e R at which a is 
not continuous has only finitely or countably many elements. 

It is well known that there is a unique positive Borel measure \x a on R that 
satisfies 

(44.5) fj, a ((a,b)) = a(b-) - a(a+), fj, a ([a,b]) = a(b+) - a(a-) 

for every a, b g R with a < b. The expression for closed intervals also makes 
sense when a = b, in which case it reduces to 

(44.6) a({a}) = a(a+) - a(a-). 

Of course, this is equal to when a is continuous at a. Alternatively, if / is a 
continuous real- valued function on the real line with compact support, then one 
can define the Riemann-Stieltjes integral 

/oo 
f{x)da(x). 
-oo 

This is a nonnegative linear functional on the space of continuous functions 
with compact support on R, and the Riesz representation theorem leads to a 
positive Borel measure that is the same as fj, a . As another approach, if a is a 
strictly increasing continuous function on R, then one can get /j, a from Lebesgue 
measure using a change of variables. If a is monotone increasing and continuous, 
but perhaps not strictly increasing, then 

(44.8) /3(x) = a(x) + x 

is continuous and strictly increasing, the previous argument can be used to get 
(ip, and one can get \x a by subtracting Lebesgue measure from [ip. If a is not 
continuous, then one can account for the discontinuities directly with sums of 
multiples of Dirac masses. 

Let us say that a real-valued function a on R has bounded variation if it 
has bounded variation on every closed interval [a, b] , and if the total variation 
A^ of a on [a, b] is uniformly bounded. This implies that a is bounded on R, 
since 

(44.9) |a(o) - a(b)\ < A b a 

for every a, b g R with a < b. It is easy to see that bounded monotone functions 
on R have bounded variation. Conversely, one can check that a function with 
bounded variation on R can be expressed as a difference of monotone increasing 
functions that are bounded. Complex-valued functions of bounded variation on 
R can be defined analogously, and represented as linear combinations of bounded 
monotone real- valued functions. 

If a is a real or complex- valued function of bounded variation on R, then 
there is a real or complex measure Borel measure \i a on R associated to a as 
before. More precisely, if a is given as a linear combination of bounded monotone 
increasing real-valued functions, then [i a is the same as the corresponding linear 
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combination of positive finite measures. In this case, the Riemann-Stieltjes 
integral (|44.7|) defines a bounded linear functional on the space of continuous 
functions on R with compact support with respect to the supremum norm, 
which leads to a real or complex Borel measure on R, as appropriate. 



45 Continuity conditions 

Let (M, d(x, y)) be a complete metric space, and let / : [a, b] — > M be a path of 
finite length A„. If {tj}°^ 1 is a monotone sequence of elements of [a, b], then it 
is easy to see that 

n 

(45.1) EW'/M)<Al 

3=1 

for every positive integer n. This implies that X^jli d(f(tj), f(tj+i)) converges, 
and hence that {f(tj)}°^ 1 converges in M, as in Section[TT] Using this, one can 
check that f(r+) = lim t ^ r+ /(£) exists for every r £ [a,b), and similarly that 
f(r—) = limt_». r _ f(t) exists for every r £ (a,b]. More precisely, this also uses 
the observation that two strictly increasing or two strictly decreasing sequences 
with the same limit can be combined into a single monotone sequence, and hence 
that the corresponding sequences of values of / have the same limit in M. 

Alternatively, let A^ be the length of the restriction of / to [u, v] when 
a < u < v < b. Of course, A^ is monotone increasing in r, and hence 

(45.2) lim A^ = sup A^ 

t^ r - a<t<r 

when a < r < b. Let e > be given, and choose u £ [a, r) so that 

(45.3) A^ > sup A^ - e. 

a<t<r 

Because A^ = A" + A^ when u < t < r, we get that 

(45.4) sup Aj, < e. 

u<t<r 

One can also use this to deal with f(r—), and similarly for f(r+) when a < r < b. 
If a < r < t < b, then 

(45.5) d(f(r), f(t)) < A*, 

as usual. It follows that / is continuous on the right at r £ [a,b) when 

(45.6) lim AL = 0, 

t— ¥r+ 

and that / is continuous from the left at r £ (a, b] when 

(45.7) lim A[ = 0. 

t— >r — 
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Equivalcntly, continuity of A r a from the right or the left implies continuity of 
f(r) from the right or the left at the same point, respectively. In particular, 
/(r) is continuous at every point where A r a is continuous, which includes all but 
at most finitely or countably many elements of [a, b], because A r a is monotone 
increasing in r. 

Conversely, A r a is continuous from the right or left at any point where / is 
continuous from the right or left. To see this, let r e (a, b] and e > be given, 
and let V = {ij}™ =0 be a partition of [a, r] such that 

(45.8) A r a (V)>A r a -e. 

If t n -i < t < t n = r, then let V t be the partition of [a, r] obtained by adding t 
between t n -\ and t n = r in V . Thus P t is a refinement of V, so that 

(45.9) A r a (V t ) > A r a (V). 

We can also consider V t as the combination of a partition of [0, t] with a single 
step from t to r, which implies that 

(45.10) A r a (V t )<Ai + d(f(t),f(r)). 
Hence 

(45.11) A*+d(/(t),/(r))>A£-e 

when t„_i < t < r. This shows that A^ is continuous from the left at r when 
f(r) is continuous from the left at r, using also the fact that A* < A£ when 
a <t < r. The argument for continuity on the right is very similar. 



46 Maximal functions 

Let \x be a positive finite Borel measure on the real line. The Hardy-Littlewood 
maximal function associated to fi is defined by 

(46.1) M *( x)=sup ^), 

xei \ 1 \ 

where the supremum is taken over all open intervals (a, b) that contain x, and 
|/| = b — a is the length of /. Put 

(46.2) E t = {x e R : p*{x) > t} 

for each t > 0. Thus x e _E t if and only if there is an open interval / such that 
x £ I and 

(46.3) KI)>t\I\- 

In this case, I C E t , and it follows that E t is an open set in R. 

Suppose that K C _E t is compact. This implies that there are finitely many 
open intervals I\ , . . . , /„ in R such that 

n 

(46.4) K C |J 7, 
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and 

(46.5) l*(Ij)>t\Ij\ 

for each j. A basic property of the real line is that for any three intervals with 
a point in common, one of the intervals is contained in the union of the other 
two. This permits us to reduce the collection of intervals . .,I n in such a 
way that no element of R is contained in more than two of these intervals. 
It follows that 

n n n 

(46.6) ]T 17,1 < i- 1 5>&) < 2*"V( U J i) • 

More precisely, if 1^ is the indicator function on R associated to A C R, then 

n „ n „ n 

(46.7) ^ = / (J2 ^) ^ ^ / 2 V ^ = 2 K U ^) ' 

j=i jR j=i J 1 j=i 

If | if | denotes the Lebesgue measure of if, then we get that 

(46.8) \K\ < 2i"V(R)- 
Hence 

(46.9) |£ t | < 2t"V(R), 

because if is an arbitrary compact subset of E t . 
If / is an integrable function on R, then we put 

(46.10) /*(*)= sup f\f(y)\dy. 

x <E I I I J I 

This is the same as the maximal function fi* (x) associated to the measure 

(46.11) »{A)= [ \m\dy. 

J A 

Thus the estimate in the previous paragraph can be re-expressed in this case as 

(46.12) \{xeK:.r(x)>t}\<2t- 1 [ \f(y)\dy 

JR 

for each t > 0. 



47 Lebesgue's theorem 

Let / be a locally integrable function on the real line. A famous theorem of 
Lebesgue implies that 

(47.1) Vm^£^\f(y)-f(x)\dy = 
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for almost every x € R. We may as well suppose that / is integrable on R, 
since the problem is local. 
Put 



1 



x-\-r 



(47.2) L(f)(x) = limsup- / \f(y) - f(x)\ dy 

r— >0 ^ r Jx—r 

^ rx+r 

= lim sup — / \f(y) - f{x)\dy. 
e->-o 0<r<£ 2r J X _ T 

Observe that 

(47.3) L(fi + f 2 ){x) < L{h){x) + L(f 2 )(x), 
and that 

(47.4) L(g)(x)=0 
when g is continuous at j. It follows that 

(47.5) L(f)=L(f-g) 

for every continuous function g. 
We also have that 

(47.6) L(f)(x)<r(x) + \f(x)\, 

where f*(x) is as in (|46.10j) . This implies that 

(47.7) L(f)(x)<(f-gy(x) + \f(x)-g(x)\ 
for every continuous function g on R. Hence 

(47.8) {x G R : L(f)(x) > t} 

C {x e R : (/ - g)*(x) > t/2} u {x e R : |/(x) - g(x)\ > t/2} 

for every t > 0. 

As in the previous section, 

(47.9) \{x e R : (/ - <?)*(*) > t/2}| < 2 (t/2)- 1 1|/ - fl || x = 4F 1 ||/ - 
Similarly, 

(47.10) |{ieR:|/(i)-j(i)|>t/2}| 



< {t/2)-' / |/(y)-5(y)|rfy = 2t- i |l/-5lli 

JR 

for every t > 0. Of course, we can choose g so that ||/ — g\\i is arbitrarily small, 
because continuous functions are dense in L 1 (R). Using this, one can show that 
L(f)(x) = almost everywhere, as desired. 
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48 Singular measures 



Let [i be a positive finite Borel measure on the real line which is singular with 
respect to Lebesgue measure. This means that there is a Borel set SCR whose 
Lebesgue measure \B\ is while n(R\B) = 0. Let us check that 

(48.1) lim Mfr-r.s + r)) = 

for almost every icR with respect to Lebesgue measure. If B happens to be 
a closed set in R, then this holds trivially for every x £ R\B. The idea is to 
use the maximal function to make an approximation by this type of situation. 
Consider 

, .. , v u((x — r, x + r)) 

(48.2) L{p){x) = limsup^ ' " 

r _>o 2 r 

n((x-r,x + r)) 
= km sup , 

e->0 0<r<e 2r 

in analogy with the previous section. Thus 

(48.3) L(fi)(x) < fi*(x) 
and 

(48.4) L(jn + fi2)(x) < L(jn)(x) + L{ii 2 ){x) 

for every pair of positive Borel measures fi% , \ii on R. 

Let U be an open set in R such that B C U, and let K be a compact set in 
R such that K C U. Also let /ii, /Z2 be the Borel measures on R defined by 

(48.5) m(A)=n(AnK), iM,(A)=n(An(R\K)). 
Thus L(pi)(x) = when x € R\if, which implies that 

(48.6) < L(w)(a:) < l4W 

for every a; G R\if ; and hence for every x € R\f7. The main point now is to 
choose K C U so that 

(48.7) KU\K) = fi(R\K) = M2 (R) 

is arbitrarily small. This is easy to do, using the fact that open subsets of the real 
line are cr-compact. This implies that L(/i)(x) = for Lebesgue almost every 
x 6 R\t/, by the maximal function estimates in Section HHl More precisely, 

(48.8) {x £ R\U : L{p){x) > t} C {x G R : > *} 

for every t > 0, and the Lebesgue measure of the set on the right can be made 
arbitrarily small, by choosing K so that (|48.7p is small. This implies that 
L(n){x) < t almost everywhere on R\U with respect to Lebesgue measure for 
each t > 0, and hence that L(/i)(x) = almost everywhere on R\J7, by taking 
t = 1/n, where n is a positive integer. It follows that L(fi)(x) = for Lebesgue 
almost every x G R, as desired, since we can also choose U to have arbitrarily 
small Lebesgue measure, because \B\ = 0. 
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49 Differentiability almost everywhere 



Let a be a bounded real- valued monotone increasing function on the real line, 
and let fi a be the corresponding positive Borel measure on R, as in Section 
031 Using the Lebesgue decomposition and Radon-Nikodym theorem, we get 
an integrable function / with respect to Lebesgue measure and a Borel measure 
v that is singular with respect to Lebesgue measure such that 

(49.1) A*a(A)= / f{y)dy + u{A). 

J A 

We would like to show that a{x) is differentiable almost everywhere on R with 
respect to Lebesgue measure, and more precisely that a'(x) = f(x) almost 
everywhere. 

Thus we would like to show that 

a(x + h) — a(x) , , 

(49.2) lim V / — = fix) 

for almost every x G R. As a first approximation, we have that 



(49.3) lim-/ f(y)dy = f(x) 

h->Q h J x 

for almost every x £ R, by Lebesgue's theorem. More precisely, this integral is 
supposed to be oriented, as in calculus, so that the integral from x to x + h is 
— 1 times the integral from x + h to x. This means that we are looking at the 
average of / over the interval [x,x + h] when h > 0, and over [x + h,x] when 
h<0. 

It remains to show that 

u n ^ a{x + h)-a{x) 1 f x+h . , 

(49.4) 2 f(y)dy 

converges to as h — > for almost every x € R. If a is continuous at x and x+h, 
then this difference is equal to v([x,x + h])/h when h > 0, and similarly when 
h < 0. In any case, this difference is nonnegative, bounded by v([x,x + h])/h 
when h > 0, and similarly for h < 0. Hence the difference converges to almost 
everywhere, as in the previous section. 

Of course, it is not important that a be bounded or defined on the whole 
line, since the problem is local. If a is a real or complex-valued function of 
bounded variation on R, then a can be expressed as a linear combination of 
monotone functions, and is therefore differentiable almost everywhere too. 

50 Maximal functions, 2 

The maximal function of a positive Borel measure y, on R can also be given by 

(50.1) /U *( :c ) =SU p/^) ) 

xei Ml 



61 



where now the supremum is taken over all closed intervals I = [a, b] that contain 
x and have positive length \I\ = b — a. The previous definition is clearly less 
than or equal to this one, since every open interval (a, b) is contained in a closed 
interval [a, b] with the same length, and 

(50.2) n({a,b)) < n([a,b]). 

In the other direction, one can approximate closed intervals by open intervals 
that contain them. 

Let a be a bounded monotone increasing real-valued function on the real 
line. If fi a is the corresponding measure, as in Section 1441 then its maximal 
function can be expressed directly in terms of a, by 

(50.3) sup «(*)-«(«) 



l<X<b 



b — a 



More precisely, the supremum is taken over a, b £ R with a < x < b and a < b, 
and this expression for the maximal function is trapped between the previous 
two, by (|44.3|) . If Et = {x € R : ^* a {x) > t}, then the main estimate from 
Section [46] can be reformulated as 

(50.4) \E t \ < 2T 1 (supa(x) - inf a{x)). 

xER xeR 

Now let (M, d(x, y)) be a metric space, and let / : [a, b] — > M be a path of 
finite length. Let a(r) be the length A'^ of the restriction of / to [a, r] when 
a < r < b, and put a(r) — when r < a, a(r) — when r > b. Thus a is a 
bounded monotone increasing function on R, and 

(50.5) d(f(r), f(r')) < A r r ' = a[r') - a(r) 

when a < r < r' < b. If [r, r'] contains an element of H\E t , where t > and E t 
is as in the previous paragraph, then 

(50.6) d(f(r), f{r')) < a{r') - a(r) <t(r' - r). 

In particular, the restriction of / to [a, b)\E t is Lipschitz with constant t. Note 
that [a, b]\E t is a closed set, because E t is open. Also, (|50.4|) reduces to 

(50.7) |S t |<2i- 1 A^. 

If our metric space is a real or complex vector space with a norm, then we 
can extend the restriction of / to [a, b]\E t to a t-Lipschitz function f t on [a, b]. 
Remember that E t can be expressed as the union of finitely or countably many 
pairwise-disjoint open intervals, since E t is an open set in R. If / is one of these 
open intervals and / C [a, 6], then f t is defined on / as the affine function that 
agrees with / on the endpoints. If a or b is an element of E t , and / is an open 
interval in E t that contains a or b and whose other endpoint is in [a, b], then 
we can take f t to be the constant on I n [a, b] that agrees with / at the other 
endpoint of /. Of course, if [a, b] C E t , then there is nothing to do. 
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51 Vector-valued functions 



Let V be a real or complex vector space with a norm. As usual, a function 
F : [a, b] — > V is said to be differentiable at x G (a, b) if 

(51.1) lim n X + h)-F( X ) 

h^O h 

exists in V. One can also consider one-sided limits at the endpoints. 

For example, let V be i 1 ([0, 1]), with respect to Lebesgue measure. Let F(x) 
be the indicator function of [0,x] as an element of i 1 ([0, 1]) for each x G [0, 1]. 
It is easy to see that 

(51.2) \\F(x)-F(y)\\ 1 = \x-y\ 

for every x, y G [0,1], so that F is actually an isometric embedding of [0, 1] in 
L 1 ([0, 1]). However, one can also check that F is not differentiable at any point 
in [0, 1]. The derivative of F at x G [0, 1] is basically a Dirac mass at x, in a 
weak sense that we shall discuss later. 

Now let V be L°°(R). If / is a bounded real or complex- valued Lipschitz 
function on R, then let F : R — > L°° (R) be the mapping that sends i€Rto 
the translate f x (-) = /(• — x) of / by x. It is easy to see that this is a Lipschitz 
mapping from the real line into L°°(R), because / is a Lipschitz function on 
R. If F is differentiable at any point in R as a mapping into L°°(R), then 
the difference quotient for / would converge uniformly on R. This would imply 
that / is continuously differentiable on R, with uniformly continuous derivative. 
Conversely, if / is continuously differentiable on R, with uniformly continuous 
derivative, then the difference quotient for / does converge uniformly to the 
derivative of /, and F is differentiable at every point in R. More precisely, the 
derivative of F at x G R corresponds to —1 times the derivative of / translated 
by x in this case. If F is not bounded, then one can take F{x) = f x — /, and 
get similar conclusions. 

Let V be any vector space with a norm \\v\\ again, and suppose that F, G 
are V- valued functions on an interval [a, b] with finite length. One can check 
that F — G also has finite length on [a, b], which is less than or equal to the 
sum of the lengths of F and G. It follows that \\F — G\\ has finite length as a 
real- valued function on [a, b] , which is to say that it has bounded variation. In 
particular, \\F — G\\ is differentiable almost everywhere as a real-valued function 
on [a,b]. If x G [a, b] is a limit point of the set where F = G, and hence a limit 
point of the set where \\F — G\\ = 0, and if \\F — G\\ is differentiable at x, then 
the derivative of \\F — G\\ at x is equal to 0. This implies that the derivative of 
F — G exists at x and is equal to 0, under these conditions. In particular, this 
can be applied to Lipschitz approximations G of F as in the previous section. 

52 Uniform boundedness, 4 

Let W be a real or complex vector space with a norm ||ui||, and let {A,,}"?^ be 
a sequence of bounded linear functionals on W. Suppose that the dual norms 
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of the Xj's are uniformly bounded, so that 

(52.1) ||Aj||, <L 

for some L > and each j. Under these conditions, one can check that the set 
of w £ W such that {Xj(w)}°Z 1 is a Cauchy sequence in R or C, as appropriate, 
is closed. Because of the completeness of the real and complex numbers, this is 
the same as saying that the set of w £ W such that {Xj(w)}^ 1 converges in R 
or C is closed. It is easy to see that this is also a linear subspace of W . 

In particular, {Xj(w)}j° =1 converges for every w £ W if it converges for a set 
of ui's whose linear span is dense in W. In this case, 

(52.2) X(w) = lim Xj(w) 

j-s-oo 

defines a linear functional on W. More precisely, A is a bounded linear functional 
on W, with 

(52.3) l|A||*<£, 
because of (|52.1[) . 

Conversely, if {Xj(w)}'j^ 1 converges for every w £ W, then {Xj(w)}°^L 1 is 
bounded for every w £ W. The Banach-Steinhaus theorem implies that the 
Aj's have uniformly bounded dual norms when W is complete, as in Section [2^1 

Suppose now that E is a set of real numbers, and that for each t £ E we 
have a bounded linear functional At on W . Suppose also that is a limit point 
of E in R, and that the At's have uniformly bounded dual norms. If 

(52.4) \vm\ t {w) 

exists in R or C, as appropriate, for a set oi w € W whose linear span is dense 
in W , then this limit exists for every w € W , and determines a bounded linear 
functional on W . This is a variant of the earlier discussion for sequences. One 
can also apply the previous remarks to sequences of elements of E that converge 
to 0. 

53 Weak* derivatives 

Let W be a real or complex vector space with a norm and let F{x) be a 
function on a closed interval [a, b] in the real line with values in the dual W* 
of W. Thus F(x)(w) is a real or complex- valued function of x on [a, b] for each 
w £ W, as appropriate. If F(x) has finite length as a mapping from [a, 6] into 
W* , then F(x)(w) has bounded variation as a real or complex- valued function 
of x on [a, b] for every w € W. This implies that for each w £ W there is a 
set Z(w) C [a, b] of Lebesgue measure such that F(x)(w) is differentiable for 
every x £ [a, b]\Z(w). 

Suppose that W is separable, so that there is a collection {wi}i of finitely 
or countably many elements of W whose linear span is dense in W. Thus 
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Z = U ; Z(wi) also has Lebesgue measure 0. If x £ [a, b]\Z, then F(x)(wi) is 
diffcrentiable at x for each I. 
We also know that 

(53.1) sup ^» < oo 

a<y<b \X - l/\ 

for almost every x € [a, 6]. This follows from the finiteness almost everywhere 
of the maximal function associated to the function that measures the length 
of F on [a, r], as in Section [50l If a; has this property and x ^ Z 1 then one can 
check that the derivative 

(53.2) km n^h)( W )-F(x)H 

h^O h 

of F(x)(w) at x exists for every w £ W^, using the remarks in the previous 
section. Hence the derivative 

Fix + h)- F(x) 
53.3 lim — i '- 

h->0 h 

exists for almost every x £ [a, b] in the weak* topology under these conditions. 

Let W be the space of continuous real or complex-valued functions on [0, 1] 
with the supremum norm, so that W* can be identified with the space of real or 
complex Borel measures on [0, 1], as appropriate. Also let F(x) be the function 
on [0, 1] with values in W* that assigns to x £ [0, 1] the measure on [0, 1] that 
is Lebesgue measure on [0,x]. This is basically the same as the function on 
[0, 1] with values in -L 1 ([0, 1]) discussed in SectionEU by identifying integrable 
functions on [0, 1] with absolutely continuous measures with respect to Lebesgue 
measure. Now that we consider F to take values in W*, it is easy to see that 
the derivative of F exists with respect to the weak* topology on W* at every 
x £ [0, 1], and corresponds to a Dirac mass at x. 



54 Lipschitz functions 

Let / be a real or complex- valued Lipschitz function on the real line. Thus / is 
differentiable almost everywhere, since it has bounded variation on any bounded 
interval. In particular, 

(54.1) Um f(x + hi)-m =f{x) 
j— >oo hj 

almost everywhere for every sequence {hj}°^ 1 of nonzero real numbers that 
converges to 0. This implies that 

(54.2) lim / + y <j)(x)dx= f f'(x)<f>(x)dx 
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for every integrable function <j) on R, by the dominated convergence theorem. 
More precisely, this also uses the fact that the difference quotients are uniformly 
bounded, because / is Lipschitz. Hence 

/ f'(x)4>(x)dx. 
Jr 

fix) 

in the weak* topology on L°°(R), as the dual of L 1 (R). 
Alternatively, we can start with the identity 

(54.5) / f{x + h) - f{x) J>(x) dx = - ( f(x) HX) ~ t (X " k) dx, 
Jr h Jr h 

which uses the change of variables x i— > x — h. This implies that 

(54.6) lim / /(X + h ] - <t>{x) dx = - [ f(x)4>'(x)dx 
h^oJ R h J R 

when <f) is a continuously-differentiable function with compact support on R, 
for instance. Thus 

(54.7) \ h (4>)= f + 4>{x)dx 

Jr h 

defines a bounded family of linear functionals on i : (R) that converges as h — > 
on a dense linear subspace of L 1 (R), and hence converges on all of L 1 (R), as 
in Section [52] The limit is a bounded linear functional on L X (R) that can be 
expressed by integration with an element of L°°(R), that corresponds to the 
derivative of /. 

If / is a bounded Lipschitz function on R, then we can take F : R — > L°° (R) 
to be the function that sends are real number to the corresponding translate 
of /, as in Section [5TJ Otherwise, we can take a difference between / and 
its translate to get an element of L°°(R), as before. This defines a Lipschitz 
mapping from R into L°°(R), with a weak* derivative at every point. 

55 Averages 

Let / be a locally integrable function on the real line, and put 

(55.1) A h {f){x) = ^J X+ f(y)dy 

for every h,x € R with h =^ 0. As before, the integral in this expression is 
considered to be oriented, as in ordinary calculus, so that 

(55.2) A h (f)(x) = ±- [* f(y)dy 

l»l J x -\h\ 



(54.3) lim / /(* + *)-/(*) ^ {x)dx 
fo-s-o J R h 

This is the same as saying that 

(54.4) lim /(* + ")-/(*) 

h^O h 
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when h < 0. In particular, 



(55.3) \A h (f)(x)\ < A h (\f\)(x). 

If / G LP(R), 1 < p < oo, then G L P (R) for every ft ^ 0, and 

(55.4) HA(/)|| P < H/llp. 

This is very easy to see when p = oo. If p = 1 , then one can integrate f|55 -3[) in 
x, and the use Fubini's theorem. If 1 < p < oo, then 

(55.5) \A h {f){x)\ p < A h (\f\*)(x), 

by the convexity of r p on the nonnegative real numbers, as in Jensen's inequality. 
One can then integrate in x and apply Fubini's theorem, as when p = 1. 
If / is continuous at x, then 

(55.6) \imA h (f)(x) = f(x). 

h—>0 

If / is uniformly continuous, then this holds with uniform convergence. If / is 
a continuous function on R, then / is uniformly continuous on bounded sets, 
and we get uniform convergence on bounded sets. 
If / G L p (K), 1 < p < oo, then 

(55.7) Urn \\A h (f) - f\\ p = 0. 

h— ^0 

To see this, observe first that this holds for every continuous function / with 
compact support on the real line. More precisely, / is uniformly continuous in 
this case, so that Ah(f) converges to / uniformly as h — > 0, as in the previous 
paragraph. Also, the support of Ah(f) is contained in a single compact set when 
\h\ < 1, say, and hence uniform convergence implies convergence in the L P (R) 
norm. Any / G L P (R) can be approximated in the L p norm by a continuous 
function with compact support when p < oo, and one can get (|55.7j) using this 
approximation and the uniform bounds for Ah on L P (R). 

56 L p derivatives 

If /, g are locally integrable functions on the real line, then we say that /' = g 
in the sense of distributions if 

(56.1) / f(x) <t>'{x) dx = - [ g{x)<p{x)dx 

for every continuously-differentiable function <fi with compact support on R. If 
/ is continuously differentiate on R, then the ordinary derivative of / has this 
property, by integration by parts. Similarly, if 

fKR <^ r f( x + h ) - f( x ) i \ 

(56.2) hm = g(x) 

h-¥0 h 
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with respect to the L 1 norm on any bounded interval in the real line, then /' = g 
in the sense of distributions. This follows from (|54.5p . by taking the limit as 

Suppose that 

(56.3) f(x + h) — f(x) G L P (R) 

for some p, 1 < p < oo, and every h G R, which holds in particular when 
/ S L P (R). We say that / is differentiable in the LP sense, with derivative equal 
to g, if g € L P (R), and one has convergence in (|56.2I) in the LP norm. This 
implies that /' = g in the sense of distributions, as in the previous paragraph. 
If g € LP(R) and 

(56.4) f(x)= [ g(y)dy 

J a 

for some a £ R, then 

(56.5) = A h {g)(x) 

converges to g as h — > in the LP norm, as in the previous section, and so 
the derivative of / is equal to g in the LP sense. If g is locally integrable, then 
Ah{g) -> 9 as /i -> in the L 1 norm on every bounded interval, and we still 
have that /' = g in the sense of distributions. 

Note that /' = in the sense of distributions when 

(56.6) / f{x)4>'{x)dx = 

Jr. 

for every continuously-difFerentiable function cj) with compact support. If tp is a 
continuous function with compact support on R such that 

(56.7) f V(y)dy = 0, 

JR 

then 

(56.8) 0(a) = f ^(y)dy 

•J — oo 

is continuously differentiable and has compact support, and <j)' — ip. Thus /' = 
in the sense of distributions if and only if 



(56.9) / f(x) tp{x) dx = 

for every continuous function ip with compact support and integral 0. One can 
show that this happens if and only if / is constant almost everywhere. 

If /' = g in the sense of distributions, then it follows that that the difference 
between / and (|56.4I) is constant almost everywhere, since they have the same 
derivative. In particular, f|56 . 5[) holds for each h ^ and almost every x. If 
g € L P (R), then we get that the derivative of / is equal to g in the LP sense, as 
before. 
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57 L p Lipschitz conditions 



Let / be a locally integrable function on the real line that satisfies (I56.3[) for 
some p, 1 < p < oo, and every AeR, such as an LP function. Suppose that 

i/p 



(57.1) ( / \f(x + h)- f(x)\Pdx) <C\h\ 



'R 

for some C > and every h € R, which is the same as saying that 

(57.2) fjE+I^M 

is uniformly bounded in L P (R). Note that this happens when /' = g e £ P (R) 
in the sense of distributions, since the difference quotient is equal to Ah{g). 

Suppose also that 1 < p < oo, and let q be the conjugate exponent to p, 
l/p + l/q = 1. If Xh is as in (|54.7p for ft ^ 0, then A/j is a uniformly bounded 
family of linear functionals on L q (R), by Holder's inequality. As in Section [SH 

(57.3) lim \ h (<f>) = - / f{x) cj>'(x) dx 

for every continuously-differentiable function <f> with compact support on R. 
Because these functions are dense in L 9 (R), it follows that 

(57.4) lim A h (^) 

/t— >o 

exists for every <f> € i g (R), as in Section [S^J The limit determines a bounded 
linear functional on L 9 (R), and so there is a function g £ L P (R) such that 

(57.5) lim \h(4>) = / g(x)4>(x)dx 

for every G L 9 (R). In particular, this holds when <f> is a continuously- 
differentiable function with compact support on R, for which we have (|57.3|) . 
This shows that /' = g in the sense of distributions. 

If p = 1, then it is better to think of as a uniformly bounded family 
of linear functionals on the space Cq(R) of continuous functions on the real 
line that vanish at infinity, equipped with the supremum norm. We still have 
(|57.3I) for every continuously-differentiable function <j) with compact support on 
R, and hence that (I57.4[) exists for every <fi g Co (R) , as in Section [S^J The 
limit determines a bounded linear functional on Cq(R), and so there is a real 
or complex Borel measure /i on R such that 

(57.6) lim \ h U) = ( <f>dfi 

for every cj) g Co(R). Combining this with (I57.3|) . we get that 

(57.7) / f{x)(j>'{x)dx = - ( 4>dn 

JR JR 
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for every continuously-differentiable function <f> with compact support on R. 
This can be expressed by saying that /' = fi in the sense of distributions. 

If a is a function of bounded variation on R, and if \x a is the corresponding 
real or complex Borel measure as in Section 031 then a' = fj, in the sense of 
distributions. This is basically another version of integration by parts. One can 
also show that every real or complex Borel measure on the real line is of this 
form. If / is a locally integrable function on R such that /' = fi in the sense of 
distributions for some real or complex Borel measure /i, then it follows that / 
is equal almost everywhere to a function of bounded variation. Conversely, one 
can check that such functions satisfy the integrated Lipschitz condition (|57.1[) 
with p = 1 . 



58 Dyadic intervals 

In this section, it will be convenient to use [0, 1) as the unit interval, consisting 
of x £ R with < x < 1. By a dyadic subinterval of the unit interval we mean 
an interval of the form [j 2~ l , (j + 1) 2 _1 ), where j, I are nonnegative integers 
and j < 2 l . Thus the unit interval is the disjoint union of these dyadic intervals 
at level I. If /, I' are dyadic intervals of arbitrary lengths, then either I C 
I' C J, or / n I' — 0. More precisely, if |/| < |/'|, where |/| denotes the length 
of /, then either I C V or I n V = 0. 

Let \x be a positive Borel measure on [0, 1). The dyadic maximal function 
associated to \x is defined by 

(58.1) fi* s (x) = sup 

xei l J l 

where now the supremum is taken over all dyadic intervals that contain a given 
point x £ [0, 1). Similarly, if / is an integrable function on [0, 1), then we put 



(58.2) 



fl(x) =sup T7T / \f(y)\dy, 
xei Ml J i 



where again the supremum is taken over all dyadic intervals I such that x £ I. 
This is the same as ^g{x), where fi is the Borel measure on [0, 1) defined by 

(58.3) fi(A)= [ \f(y)\dy, 

J A 

as in Section |46] 
Consider 

(58.4) E Slt = {x£ [0,1) : fi* s (x) > t} 

for each t > 0. Thus x £ Eg :t if and only if there is a dyadic interval / such that 
x £ I and 

(58.5) ^{I)>t\I\, 

in which case / C Eg,t- Let I(x) be the maximal dyadic interval that contains 
x and satisfies (|58.5[) for each x £ Eg.t- If x,y £ E$ tt , then either I(x) = I(y) 
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or I(x) n I(y) — 0, by maximality and the nesting properties of dyadic intervals 
mentioned before. 

Let Ait be the collection of dyadic intervals of the form I{x) for some x in 
Egj. Note that the elements of Ait are pairwise disjoint, and 

(58.6) (J I = E 5 ,t. 

ieMt 

Hence 

(58.7) \E Stt \= \ 1 \<t~ 1 E KI)=t~ 1 n(E S ,t). 

ieM t ieM t 

This is almost the same as the estimate in Section|46l but without the additional 
factor of 2. Although we have focused on dyadic subintervals of the unit interval 
for simplicity, there is an analogous discussion for arbitrary dyadic intervals in 
the real line, and the corresponding maximal functions. 



59 Dyadic averages 

Let / be an integrablc function on [0, 1), and put 

/•(.J + l)2- ! 

(59.1) Mf)(x)=2 l f(y)dy 

Jj2-> 

when j 2~ l < x < (j + 1) 2~'. Thus Ai(f)(x) is the average of / over the dyadic 
interval of length 2~ l that contains x. In particular, Ai(f) is constant on dyadic 
intervals of length 2~ l , by construction. Also, 

,1 2'-l r (j + l)2- 1 

(59.2) / Mf)(x)dx = V / Mf)(x)dx 

JO ,_ n J 3 2 



3=0 



Ij2-l 



2'-l r U + l)2-' ,1 

T / f(x)dx= / f{x)dx. 

J= J j 2-' JO 



If / G LP([0, 1)), 1 < p < co, then 

(59.3) HA(/)||p < ||/||p. 
This is immediate when p = oo. Note that 

(59.4) M/Xa;)^ 4(1/1)0*) 
for every x € [0,1) and i > 0, and that 

(59.5) \Mf)(x)\? < M\f\n(x) 

when / S L p ([0, 1)), 1 < p < oo, by Jensen's inequality. To estimate ||A;(/)|| P , 
one can integrate these inequalities using the identity in the previous paragraph. 
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As in Section l55l 



(59.6) 



lim Mf)(x) = f(x) 



when / is continuous at x, and with uniform convergence when / is uniformly 
continuous on [0, 1). If / is a continuous function on [0, 1], then / is uniformly 
continuous, by compactness. If / e L p ([0, 1)), 1 < p < oo, then 



This follows from uniform convergence when / is a continuous function on [0, 1], 
and otherwise one can approximate by continuous functions using the uniform 
bound (|59.3[) . Of course, L p ([0, 1)) is the same as L p ([0, 1]), and so continuous 
functions on [0, 1] are still dense in this space when p < oo. 

If / £ -^ 1 ([0, 1)), then Lebesgue's theorem implies that (|59.6[) holds almost 
everywhere on [0, 1). More precisely, 



for almost every x € [0,1), where Ii(x) denotes the dyadic interval of length 
2~ l that contains x. This follows from Lebesgue's theorem, as in Section [47l 
and one can also establish it a bit more directly. Specifically, one can use the 
estimate for the dyadic maximal function in the previous section, instead of the 
estimate for the Hardy-Littlewood maximal function in Section H51 

60 Rademacher functions 

Let ri(x), rz(x), ... be the functions defined on [0, 1) by 

(60.1) ri(x) = 1 when j 2~ l < x < (j + 1) 2~ l and j is even 

= -1 when j 2~ l < x < (j + 1) 2~ l and j is odd. 

Thus ri(x) is constant on each dyadic interval of length 2~', 

(60.2) f n{x)dx = 



for each dyadic interval I of length 2~ l+1 , and |n(aO| = 1 for every x E [0, 1) 
and positive integer These are known as the Rademacher functions on the 
unit interval. 

Let X be the set of sequences x — {xk}^ =1 with Xk — 1 or — 1 for each k. 
Equivalently, X is the Cartesian product of a sequence of copies of {1, —1}. This 
is a compact Hausdorff topological space with respect to the product topology, 
which is homeomorphic to the usual middle-thirds Cantor set. There is a natural 
continuous mapping from X onto the closed unit interval [0, 1], defined by 



(59.7) 



lim \\Mf)-f\\ p = 0. 



(59.8) 





(60.3) 
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Each element x of X corresponds to an infinite binary sequence {(^fc + l)/2}^L 1 , 
and j3 sends x to the real number with that binary expansion. Every real number 
in [0, 1] has a binary expansion, and the binary expansion is unique for all but 
a countable set of real numbers. Dydadic rational numbers of the form j 2~ l , 
< j < 2 l , have two binary expansions, which agree up to a point where one 
has a 1 followed by all 0's, and the other has a followed by all l's. 

There is a natural Borel probablility measure on X, which is the product 
measure associated to 1, —1 having probability 1/2 in each coordinate. This 
probability measure corresponds exactly to Lebesgue measure on [0, 1] under 
the mapping (3. That j3 fails to be one-to-one on a countable set does not 
really matter here, since countable sets have measure 0. Thus [0, 1) and X are 
basically the same as probability spaces. The Rademacher functions 77 on [0, 1) 
correspond to the coordinate functions x <— > xi on X, which are independent 
identically distributed random variables. 

In particular, 



Jo 

when 1 < li < I2 < ■ ■ ■ < l n - Because of independence, the integral of the 
product should be the same as the product of the individual integrals, each of 
which is 0, by (|60.2[) . One can see this more directly by observing that the 
integral over each dyadic interval of length 2~ ln+1 is 0, because the integral 
of ri n over such an interval is 0, as in (|60.2I) . while the other functions in the 
integral are constant over these intervals. 

61 L p estimates 

The Rademacher functions are orthonormal in L 2 ([0, 1)), since 



(60.4) 




(61.1) 




for each /, and 



(61.2) 




when k ^= I. This implies that 



(61.3) 




for every a%, . . . , a n € R. Let us check that 



(61.4) 
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The left side of (161. 4[) is clearly less than or equal to the right side, by the 
triangle inequality. To get the opposite inequality, one can choose a dyadic 
interval of length 2~™ on which ain = \ai\ for I = 1, . . . , n. 

Before proceeding, it will be helpful to remember two basic facts about LP 
norms. The first is that 



(61.5) 



\\f\\ P =(j\f(x)\rdx 



i/p 



is monotone increasing in p, by Jensen's inequality. The second fact is that the 
LP norm is logarithmically convex in 1/p, which means that 

(61-6) ||/|| r < ll/H* ll/H J"' 

when p, q, r > 0, < t < 1, and 

1 t 1-t 



(61.7) 



P 



This can be derived from Holder's inequality. It is a little simpler to start with 
the r = 1 case, and then get (|61 by applying the r = 1 case to |/| r . 
If 2 < p < oo, then there is a constant C(j>) > such that 



(61.8) 



i=i 



ai n 



< 



Tl 



1/2 



for every a\ , . . . , a n € R.. Of course, it is very important here that C (p) does 
not depend on n. To prove (|61.8I) . it suffices to restrict our attention to p — 2 k 
for some positive integer k > 2, because of the monotonicity of the L p norm. 
One can get better constants for the intermediate exponents using (|61.3I) and 
(|61.6[) . If p = 2 k , then one can expand 



(61.9) 



ai n 

i=i 



rl " . 2" 

J® 1=1 ' 



into a 2 fc -fold sum, where each term has the product of 2 k coefficients ai times 
the integral of the product of 2 k Rademacher functions r;. As in the previous 
section, most of these integrals are equal to 0. The only way that the integral 
is not equal to is to have n occur an even number of times for each I. In this 
case, the integral is equal to 1, and the coefficients are products of 2 k ~ 1 factors 
of rf, 1 < I < n. This permits one to estimate the 2 fe -fold sum by a constant 
multiple of 

(61.10) 



n 

i=i 



as desired. The k = 2 case is already a nice exercise. 



74 



(61.11 



If < p < 2, then there is a constant C(p) > such that 

1/2 

1=1 



1/2 

(£ a?) <£7(p) £ 



for every ai, . . . , a n £ R. Again, it is very important that C (p) not depend on 
n. This time, we can apply (I61.6|) to f — Ya=i a i r ii r = 2, and q = 4 to get 
that 

(61.12) (^flf) < £>r« 



/=i 



;=i 



for some £, < £ < 1. Using the previous estimate with p = 4, we get that 



(61.13) 



1/2 



< 



n 

^(4)(E a ?) 



(l-*)/2 



1=1 



E 



a/ n 



;=i 



This implies (|61.11[) . by dividing both sides by f^ZlLi ! 
a/ 7^ for some /. 



(i-*)/2 



at least when 



62 Rademacher sums 

Let 01,02,... be a sequence of real numbers such that YaZi a f converges, and 
consider 



(62.1) 



/O) = ^am{x). 

1=1 



This series converges in L 2 ([0, 1)), by the orthonormality of the Rademacher 
functions. Moreover, the series converges in L p ([0, 1)) for every p < oo, by the 
estimates in the previous section. Using these estimates, one can also check that 
this series converges in L p ([0, 1)) in the generalized sense for every p < oo, as 
in Section [Ml 
Observe that 



(62.2) 



Ai(/)(aO = r *( x ) 



i=i 



for every n, where A n is the dyadic averaging operator in Section 1591 This 
follows from the fact that A n {r{) = when I > n. By Lebesgue's theorem, 



(62.3) 



lim A n {f)(x) = f(x) 



almost everywhere on [0, 1), which implies that the series defining / converges 
almost everywhere. However, if X^i a l r i( x ) converges in the generalized sense 
as a sum of real numbers for any x € [0, 1), then 



(62.4) 



E h ri(x)\ = E N 



i=i 



i=i 
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converges, as in Section^ Similarly, if / € L°°([0, 1)), then A n (f) is uniformly 
bounded, and hence YaIi l a 'l converges, by (|61.4I) . 

Let 7r be a one-to-one mapping from the set Z + of positive integers onto 
itself, and let X be the space of all sequences {xk}^° =1 with Xk = ±1 for each k, 
as in Section ISTfl Thus 7r determines a measure-preserving homeomorphism from 
X onto itself, which sends {xk}kLi to {^x(fe)}^Li- Using this transformation, 
one can check that YaZi a n(i) r w(l)( x ) a l so converges almost everywhere. More 
precisely, this rearrangement of the series corresponds to the composition of 
a -K(i) r i(x) with the automorphism on X just mentioned. This new series 
is of the same type as the previous one, and so converges almost everywhere for 
the same reasons as before. 



63 Lacunary series 

Let T be the unit circle in the complex plane, consisting of z G C with \z\ = 1. 
It is well known that 

(63.1) / z j \dz\ = 

for every nonzero integer j, where \dz\ denotes the element of arc length. If 
j = 0, then z- 7 is interpreted as being equal to 1, and the integral is equal 
to 27r, the circumference of the circle. The usual integral inner product for 
complex- valued functions in L 2 (T) is defined by 

(63.2) (f,9) = ±- I f{z)W)\dz\, 
and the corresponding norm is given by 

(63.3) \\f\\ 2 =(^I/(*)| 2 ) V2 - 

The functions z J , j £ Z, are orthonormal with respect to this inner product, 
because of (163. 1|) and the fact that the integral is equal to 2-7T when j — 0. It is 
well known that the linear span of these functions is dense in L 2 (T), and more 
precisely that their linear span is dense in the space of continuous functions 
on T with respect to the supremum norm. This implies that z J , j € Z, is an 
orthonormal basis for L 2 (T) . 

Let ni < n 2 < ■ ■ ■ be a strictly increasing sequence of positive integers, and 
let a i, a 2 , ... be a sequence of complex numbers such that 53^= l l a il 2 converges. 
Thus 

OO 

(63.4) /(*) = X>« ni 

converges in L 2 (T), since the z nj, s are orthonormal in L 2 (T). We say that 
(163. 4|) is a lacunary or gap series if there is a q > 1 such that 

(63.5) n j+i ^ 1 n j 
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for each j. In this case, (|63.4[) actually converges in L P (T) for each p < oo. One 
can also show that the series converges in the generalized sense in L P (T), as in 
Section [H 

To see this, it suffices to show that for each p £ (2, oo) there is a constant 
C'ip) > such that 



(63.6) 



L 



L 

P 3=1 



1/2 



for every ax, ... at € C and L > 1. It is also enough to do this when p = 2 k for 
some integer k > 2. In this case, the pth power of the LP norm can be expanded 
into a 2 fe -fold sum, as before. More precisely, 



2 k , L n k-i . L 

(63.7) " 

' 3=1 3=1 



3=1 



since \a\ 2 — a a for every a E C. Thus each term in the 2 fc -fold sum has 2 fe_1 
aj's and z n ^s, and 2 fc_1 aj's and z" J 's. 

Each term is also integrated over T, and so includes an expression of the 
form 

(63.8) / (n^odi^') 1 ^ 1 ' 

i=i ;'=i 

where the j;'s and j.Vs are integers between 1 and L. Because of (|63.ip . this 
integral is equal to unless 



(63.9) £ n A - 2 = 0- 

i=i i'=i 

If g is large enough, depending on fc, then the only way that this can happen is 
if the largest of the 's is equal to the largest of the ny 's. One can then repeat 
the argument to get that the rij, 's and 's are permutations of each other. 

-, .12' 



This permits the 2 fc -fold sum to be estimated in terms of 
as in Section [6T1 If q is not sufficiently large for this argument, then one can 
express (|63.4|) as a sum of finitely many lacunary series with larger gaps. More 
precisely, (|63.4p can be expressed as the sum of r lacunary series with gaps of 
size q r for each positive integer r, by taking every rth term in the series. 



64 Walsh functions 

If I = {h, . . . ,l n } is a finite set of positive integers, then the corresponding 
Walsh function Wi on [0,1) is defined by 

(64.1) wi(x) = rii(x) n 2 {x) ■ ■ ■ n n (x), 
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where the n's are Rademacher functions. If I = 0, then we take wi to be the 
constant function 1. Thus 

(64.2) |to/(a:)|=l 

for every x £ [0, 1) and finite set I of positive integers, and 

(64.3) f W!{x)dx = 

Jo 

when 7^0, as in Section [60l This implies that 

el 



(64.4) / wi (x) wp (x) dx = 

J o 

when I 7^ I', so that the Walsh functions are orthonormal in 7 2 ([0, 1)). 

The Walsh functions actually form an orthonormal basis for L 2 ([0, 1)). To 
see this, it suffices to show that the linear span of the Walsh functions is dense 
in i 2 ([0, 1)). Note that wi(x) is constant on dyadic intervals of length 2~" 
when I C {1, . . . , n}, because of the corresponding property of the Rademacher 
functions. One can check that the linear span of the Walsh functions wi with 
I C {1, . . . ,n} is exactly the same as the space of functions on [0, 1) that are 
constant on dyadic intervals of length 2~ n . Both spaces have dimension 2™, 
for instance, since there are 2 n subsets of {1, . . . , n}, and 2 n dyadic intervals of 
length 2~ n . It follows that the linear span of all Walsh functions is the space 
of dyadic step functions on [0, 1), which are the functions that are constant on 
dyadic intervals of length e 2r n for some n. Hence the Walsh functions form an 
orthonormal basis of L 2 ([0, 1)), because the dyadic step functions are dense in 

There is another description of the Walsh functions in terms of harmonic 
analysis. Let X be the space of sequences {xk} ( j^ 1 with Xk = ±1 for each 
fc, as in Section 1601 It is easy to see that X is a commutative group with 
respect to coordinatewise multiplication. More precisely, X is a topological 
group with respect to the product topology, because the group operations are 
continuous with respect to this topology. Note that the probability measure 
on X described before is invariant under translations defined by this group 
structure, and hence corresponds to Haar measure on X. The Rademacher 
functions may be identified with the coordinate functions on X, and so the Walsh 
functions may be identified with products of coordinate functions on X. One can 
check that these are continuous homomorphisms from X into the multiplicative 
group of nonzero complex numbers, and that every such homomorphism arises 
in this way. 



65 Independent random variables 

Let (Xi, fii), . . . , (X n , /i„) be probability spaces, and let X = Xx x • • • x X n be 
their product, with the product measure ji = /At x ■ ■ ■ x Also let /i, . . . , f n 
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be real or complex- valued functions on X\ , . . . , X n , respectively, which can be 
identified with functions on X that are constant in the other variables. Suppose 
that fj E L 2 (Xj,fij), 



(65.1) 

and 
(65.2) 



d\Xj = 0, 



l/jlU 2 ( 



X j ,ii j ) 



\fj\ 2 d^j 



1/2 



1 



for each j. It may be that the (Xj,/ij)'s are copies of the same space, for 
instance, and that the //s are copies of the same function on this space. As 
functions on X, it is easy to see that /i, . . . , /„ are orthonormal in L 2 (X, n). 
This is because 

(65.3) fj ft dv=(J x fj dfij) ( J x ft d W ) = 
when j 7^ I in the real case, and 

(65.4) J fj Jdp = ( j fj dfij) (J x 7id»i)=0 



in the complex case. Hence 
(65.5) 



a i fi 



n 

(Eki 2 ) 



1/2 



for any real or complex numbers a±, . . . , a n , as appropriate. 

Let k be a positive integer, and put p = 2 k . Suppose in addition that 
fj G L p (Xj,[ij) for each j, and that 



(65.6) 



II/: 



j\\LP(X jlH ) 



x. 



\fj\ P dn 3 ) 



i/p 



< L„ 



for some L p > and j = 1, . . . , n. In this case, one can show that 



(65.7) 



E a 3 fj 



< 



Lr(X,n) 



1/2 



for some constant C(p,L p ) > and all ai, . . . , o„ € R or C, as appropriate. 
As usual, it is very important that C(p, L p ) does not depend on n here. To see 
this, one can expand 



(65i 



n 

Yl a i fi 


P 


L 


n 

Yl a i fi 


3 = 1 


LP{X,») 




i=i 



d[i 



into a 2 fc -fold sum, where each term is a product of 2 k dj's and perhaps their 
complex conjugates times the integral of a product of 2 k /j's and perhaps their 
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complex conjugates, as in Sections ET] and [S3] The integrals can be estimated 
individually using Holder's inequality and the hypothesis that the /j's have 
bounded L p norms. The main point is that the integral is equal to whenever 
an fj occurs exactly once for some j, because the integral over X of a product of 
/j's and perhaps their complex conjugates is equal to the product of the integrals 
over the Xj's of the corresponding /j's for j = 1, . . . , n. In the remaining terms, 
there is a product of 2 k aj's and perhaps their complex conjugates, in which 
each a.j either does not occur or occurs more than once. This permits one to 
estimate the sum by a constant multiple of 



n 

(65.9) (£| 



j=i 



as before. This is a bit more complicated than in the context of Rademacher 
functions, where the integrals are equal to when any fj occurs an odd number 
of times. However, one can use the monotonicity of £ p norms as in Section [S] to 
deal with this. 

These estimates for p — 2 k imply analogous estimates for 2 < p < 2 k , as 
in Section In particular, there are analogous estimates for every p € (2, oo) 
when the /j's have bounded L p norms for each p € (2,oo). Using the upper 
bound for k = 2, one also gets that 



n 1/2 

(65.10) (EN') <C(p,L 4) 



j=l 



£ a 3 fj 

3 = 1 



Lp(X, m ) 



for < p < 2, as in Section IBU Here C(p, L4) is a positive constant that does 
not depend on n, but does depend on p and the upper bound L4 for the L 4 
norms of the /j's. 

Suppose now that (X\, f/,\), (X2, ... is an infinite sequence of probability 
spaces, X = YYjLi Xj is their product, and /j, is the corresponding product 
measure on X. Let /1, fa, . . . be real or complex- valued functions on Xi, X 2 , ■ ■ ., 
respectively, which can be identified with functions on X that are constant in 
the other variables. As before, suppose also that fj € L 2 (Xj,fij) satisfies f|65.1[) 
and (165.2)) for each j, so that the fj's are orthonormal in L 2 (X, /1). If ax, d2, ■ ■ ■ 
is a sequence of real or complex numbers such that 53^Li l a il 2 converges, then 
J2'jLi a 3 fj converges in L 2 (X,fi). If k € Z + , p = 2 k , and fj € L v {Xj,[ij) 
for each j, with uniformly bounded L p norm, then it follows from the previous 
estimates that J2jLi a j fj converges in L p (X,fi). More precisely, YlJLx a j fj 
converges in L p (X,fi) in the generalized sense, as in Section [141 In particular, 
if fj E L p (Xj,Hj) for every j > 1 and p <E (2, 00), with 1 1 /j 1 1 lp (x,- ) uniformly 
bounded in j for each p > 2, then X)j*li a j fj converges in L P (X, (i) in the 
generalized sense for each p G (2, 00). If the (Xj,/j,j)'s are copies of the same 
space, and the fj are copies of the same function on this space, then of course 
the /,'s have the same L p norm for each j. 
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66 Linear functions on R n 

Let n be a Borel probability measure on R™ that is not the Dirac mass at 0, so 
that 

(66.1) /i(R"\{0}) > 0. 

Remember that a linear transformation T from R™ onto itself is said to be an 
orthogonal transformation if T preserves the standard inner product on R™, and 
hence the standard Euclidean norm on R™. Suppose that /i is invariant under 
orthogonal transformations, in the sense that 

(66.2) M (T(£)) = n{E) 

for every Borel set E C R™ and every orthogonal transformation T on R™ . For 
example, \i might be surface measure on the unit sphere normalized to have 
total measure 1, or fj, could be absolutely continuous with respect to Lebesgue 
measure, with a radial density. Also let p be a positive real number, and suppose 
that 

(66.3) / \x\ p dp,(x) < oo. 

Jr.™ 

Note that this integral is positive, by hypothesis. If /i is normalized surface 
measure on the unit sphere, then this condition halds for every p > 0. If fM is 
given by a radial density times Lebesgue measure, then this condition depends 
on the integrability properties of the density. 
Consider 

n 

(66.4) \ v {x) = ^x j v j 

for each v G R™. This is a linear function on R™, and every real- valued linear 
function on R™ is of this form. By hypothesis, g L P (R™, fi) for each v g R". 
Because of invariance under orthogonal transformations, 

(66.5) ||A„|U P(R n jAt) - \X v (x)\p dfi(x)) 1/P - C{p,n) \v\, 
where 

(66.6) C{p^) = (J^\x 1 y>d^x)) 1/P 
and 

(66.7) M = (£0 

is the standard norm on R™. Note that < C(p, fi) < oo. 
Remember that 

/oo 
exp(-t 2 )dt = VtF- 
-OO 
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To see this, one can begin with 

(66.9) ( exp(-t 2 )dt) = (/ exp(-< 2 )diU / exp(-u 2 )du) 

«/ — OO J — OO o — OO 

= / exp(— t 2 — u 2 ) dtdu. 
Using polar coordinates, we get that 

(66.10) / cxp{-t 2 )dt) = 2tt r cxp(— r ) dr. 
The derivative of exp(— r 2 ) is — 2r exp(— r 2 ), and so 



(66.11) 



/•OO 

/ 2 r exp(— r 2 ) dr = 1. 
Jo 



This implies (|66.8|) . as desired. 

Let fj, n be the measure on R™ given by 7r - ™/ 2 exp(— \x\ 2 ) times Lebesgue 
measure. Thus /z n (R n ) = 1, by the previous computations, and fi n is clearly 
invariant under orthogonal transformations. Also, \x\ p € L p (R",,u„) for every 
p > 0. Moreover, is the same as the product of n copies of fix on n copies of 
R, as in the previous section. 



67 Countability conditions 

Remember that a collection j3 of open subsets of a topological space X is said 
to be a 6ase for the topology of X if for every open set U in X and every point 
p E U there is an open set V £ (3 such that p € V and V C U. In this case, 

(67.1) [/ = [J{U :Ve/3,VCU} 

for every open set U in X. Conversely, (3 is a base for the topology of X if every 
open set in X can be expressed as a union of elements of j3. It is especially 
nice to have a base f3 for the topology of X with only finitely or countably 
many elements. This implies that there is a dense set in X with only finitely or 
countably many elements, by picking an element in each nonempty open set in 
the base. Conversely, if the topology on X is determined by a metric, and if there 
is a dense set in X with only finitely or countably many elements, then there 
is a base for the topology of X with only finitely or countably many elements. 
More precisely, the collection of open balls in X with centers contained in a 
dense subset of X and radii of the form 1/n, n£ Z+, is a base for the topology 
of X. 

Suppose that f3 is a base for the topology of X with only finitely or countably 
many elements, and let {Ui}i e i be a collection of open subsets of X. For each 
i e /, let ^ be the set of V e /3 such that V C [/,. Thus 

(67.2) (7, = \J{V ■ V € ft} 
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for each i € I, because j3 is a base for the topology of X. If j3' = [J ieI Pi, then 
it follows that 



For each V € j3', let i(V) be an element of / such that V C Uuv)- Also let J' 
be the set of i(V), V £ f3'. Note that I' has only finitely or countably many 
elements, because ft 1 C ft has only finitely or countably many elements. In 
addition, 



(67.4) \JUiC\JUi = \J{V :VeP'} C |J U i{v) : V € /?'} = |J 



which implies that Ujgz' ^ = Uie/ ^i- 

A set £ C I is said to be a-compact if there is a sequence i^i, K2, ... of 
compact subsets of X such that E = UnLi Suppose that X is a locally 
compact Hausdorff space, and that U is an open set in X. For each p g [7, let 
f7(p) be an open set in X such that p € U(p), U(p) is compact, and U(p) C [/. 
If there is a base for the topology of X with only finitely or countably many 
elements, then it follows that there is a set A C U with only finitely or countably 
many elements such that U — {j peA U{p). Hence U = \J p€A U(p), so that U is 
(T-compact. 

Suppose that X is a locally compact Hausdorff space in which every open set 
is cr-compact. As in Theorem 2.18 in )130j . every positive Borel measure /i on 
X such that fi{K) < 00 when K C X is compact automatically satisfies strong 
regularity properties. It is easy to see that the real line has this property, for 
instance, as well as R™ for every positive integer n. If X is a locally compact 
Hausdorff space, and there is a base for the topology of X with only finitely 
or countably many elements, then X has this property, by the remarks in the 
previous paragraph. 

68 Separation conditions 

Remember that a topological space X satisfies the first separation condition if 
for every pair of distinct elements p, q of X there is an open set U C X such 
that p 6 U and q $ U . This is equivalent to asking that every set A C X 
with exactly one element be closed, which implies that finite subsets of X are 
closed. Similarly, X satisfies the second separation condition if for every pair p, 
q of distinct elements of X there are disjoint open subsets U, V of X such that 
p € U, q G V. In this case, X is said to be a Hausdorff topological space, and X 
clearly satisfies the first separation condition. If X satisfies the first separation 
condition and for every point p £ X and closed set B C X with p $ B there are 
disjoint open subsets U, V of X such that p e U and B C V, then E satisfies 
the third separation condition, and is also said to be regular. Note that regular 
topological spaces are Hausdorff, since one can take B = {q} when q £ X and 
q =/= p. If X satisfies the first separation condition and for every pair A, B of 



(67.3) 




S3 



disjoint closed subsets of X there are disjoint open sets U, V such that A C U, 
B C V, then X satisfies the fourth separation condition, and is also said to be 
normal. As before, normal spaces are automatically Hausdorff and regular. It 
is well known that metric spaces are normal. 

Equivalently, X is Hausdorff if for every pair of distinct elements p, q of X 
there is an open set U C X such that p £ U and q is not in the closure U of 
U. Similarly, X satisfies the third separation condition if and only if it satisfies 
the first separation condition and for every point p€X and open set W C X 
with p £ W there is an open set U C X such that p £ U and U C W. This 
formulation of regularity makes it clear that it is a local property. In the same 
way, X is normal if and only if for every closed set A C X and open set W C X 
with ACW there is an open set U C X such that iC[/ and U CW. 

If X is Hausdorff, then compact subsets of X are closed, and one can show 
that X satisfies the analogues of regularity and normality for compact sets 
instead of closed sets. This implies that compact Hausdorff spaces are normal, 
because closed sets of compact spaces are compact. If X is regular, then one 
can show that X satisfies the analogue of normality in which at least one of 
the closed sets is compact. One can also show that locally compact Hausdorff 
spaces are regular. 

It is easy to see that the Cartesian product of a family of topological spaces 
that satisfy the first or second separation condition has the same property with 
respect to the product topology. This is because a pair of distinct elements of the 
product are different in at least one coordinate, and the appropriate separation 
condition can then be applied in the corresponding space. One can also check 
that a product of regular spaces is regular. This uses the local characterization 
of regularity mentioned before. 

69 Metrizability 

Let (X,d(x,y)) be a metric space, and put 

(69.1) B(p, r) = {x £ X : d(p, x) < r} 

for each p £ X and r > 0. This is the open ball in X with center p and radius 
r, which is well known to be an open set in X, by the triangle inequality. If 
AC X and r > 0, then 

(69.2) A r = [J B{p,r) = {x £ X : d(x,p) < r for some p £ A} 

peA 

is an open set in X that contains A. It is easy to check that 

oo 

(69.3) A = p| A r = p| A 1/n , 

r>0 n=l 

where A denotes the closure of A in X. In particular, every closed set in X 
can be expressed as the intersection of a sequence of open sets. This implies 
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that every open set in X can be expressed as the union of a sequence of closed 
sets. If X is compact, then every closed set in X is compact, and hence every 
open set in X is cr-compact. If X is cr-compact, then every closed set in X is 
cr-compact, and it follows that every open set in X is cr-compact as well. 

Now let (Xi,di), (X 2 , (I2), ■ ■ ■ be a sequence of metric spaces, and let X = 
YlJLi Xj be their Cartesian product, with the product topology. One can check 
that 

(69.4) d(x,y) = max(min(dj(xj,yj), l/j)) 

j>i 

defines a metric on X for which the corresponding topology is the product 
topology, where x = {xj}J^ 1 , y = {yj}°Z 1 - In particular, X may be considered 
as a compact metric space when Xj is compact for each j . 

Uhrysohn's famous metrization theorem implies that there is a metric on 
a topological space X that determines the same topology when X is regular 
and there is a countable base for the topology of X. If X is compact, and the 
topology on X is determined by a metric, then it is easy to show that there is 
a dense set in X with only finitely or countably many elements, which implies 
that there is a base for the topology of X with only finitely or countably many 
elements. This also works when X is cr-compact. Thus a base for the topology 
of X with only finitely or countably many elements is necessary for metrizability 
of a compact or cr-compact topological space. 



70 Partitions of unity 

Let X be a compact Hausdorff topological space. Suppose that for each pel, 
we have an open set U(p) in X such that p € U(p). By Uhryson's lemma, there 
is a nonnegative continuous real- valued function (j) p {x) on X such that <p(p) > 
and the support of <p p is contained in U(p). If 

(70.1) U^p) = {x e X : <f> p {x) >0}, 

then Ui(p) is an open set in X such that p € U\{p) and U\(p) C U(p). By 
compactness, there are finitely many elements p\, . . . ,p n of X such that 

n 

(70.2) X=\JU 1 (p j ). 

3=1 



This implies that Y^j=i ^Pj ( x ) > f° r ever y x e X. Hence 
(70.3) ^(x) 



<t> Pj (x) 



defines a nonnegative continuous real- valued function on X. Also, 

n 

(70.4) = 1 

i=i 
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for every x £ X, and ipj(x) > if and only if <j) p (x) > 0. 

As an application, let V be a real or complex vector space equipped with 
a norm ||u||, and let / be a continuous mapping from X into V. Let e > be 
given, and let U(p) be an open set in X such that p £ U(p) and 

(70.5) \\f(x)-f(p)\\<e 
for every x £ U(p). Put 

n 

(70.6) ff(*) = X>;(*)/(Pi), 

3=1 

where p\ , . . . , p n and ipi, . . . ,xp n are as in the previous paragraph. Thus 

n 

(70.7) ||/(«)-ff(a:)ll<X;V'i(»)ll/(a;)-/(Pi)ll<e 

3=1 

for every x £ X, using (|70.5I) and the fact that x £ U(pj) when ipj(x) > 0. The 
same argument works when the topology on V is determined by a collection Af 
of seminorms, and is replaced by the maximum of finitely many seminorms 
in N. 



71 Product spaces 

Let X, Y be compact Hausdorff topological spaces, and let X x Y be their 
Cartesian product, equipped with the product topology. Thus X x Y is also 
a compact Hausdorff space. Also let f(x,y) be a continuous real or complex- 
valued function on X x Y, and let e > be given. For each x £ X and y £ Y, 
there are open sets U(x, y) C X, V(x, j/)cy such that x £ U(x, y), y £ V(x, y), 
and 

(71.1) \f(u,v)-f(w,z)\<e 

for every u,w £ U(x, y) and v, z £ V(x, y), by the continuity of / at (x, y) and 
the definition of the product topology. If we fix x £ X for a moment, and apply 
this to each y £ Y , then the open sets V(x, y), y £ Y, form an open covering of 
Y. By compactness of Y , there are finitely many elements y\, . . . , y n oiY such 
that 

n 

(71.2) Y=\JV(x, Vj ). 

3=1 

Put U(x) — nj=i U(x,yj), so that U(x) is an open set in X that contains x. 
Moreover, 

(71-3) \f(u,y)-f(w,y)\<e 

for every u,w £ U (x) and y £ Y , by applying (|71.1I) to v = z = y, which is 
contained in V(yj) for some j. Similarly, one can use compactness of X to show 
that for every y £ Y there is an open set V(y) C Y such that y £ V(y) and 

(71.4) \f(x,v)-f(x,z)\<e 



S(i 



for every v,z £ V(y) and x G X. 

Let fi, v be regular Borel probability measures on I, Y, respectively. By 
the Riesz representation theorem, this is equivalent to having positive linear 
functionals on the spaces of continuous functions onl, 7 that take the value 
1 on the constant functions identically equal to 1 on these spaces. If f(x, y) is 
a continuous function on X x Y, then it follows from the uniform continuity 
properties in the previous paragraph that 

(71.5) / f(x,y)d»(x), [ f{x,y)dv{y) 
J x Jy 

are continuous functions on Y, X, respectively. Thus 

(71.6) J (J f(x,y) dfi(x))dv(y), J ( J f(x,y) du{y))d^x) 

define nonnegative linear functionals on the space of continuous functions on 
X x Y that take the value 1 on the constant function 1. One can also show 
that these two linear functionals are the same, because they are the same when 
/ is a linear combination of products of continuous functions on X and Y, and 
because these functions are dense in the space of all continuous functions on 
X x Y with respect to the supremum norm. The latter statement can be verified 
using partitions of unity on X and uniform continuity over Y, for instance, as in 
the preceding section and paragraph. The Riesz representation theorem implies 
that there is a unique regular Borel probability measure \i x v on X x Y such 
that this linear functional on the space of continuous functions on X x Y is 
given by 

(71.7) / f(x,y)d{fj,xu){x,y). 

JXxY 

There are analogous arguments for nonnegative Borel measures with suitable 
regularity properties on locally compact Hausdorff spaces, which correspond to 
nonnegative linear functionals on continuous functions with compact support 
on these spaces. If the measures are finite, then one can simply compactify the 
spaces using one-point compactifications. 

Let fix, Py be bases for the topologies of X, Y, respectively. It is easy to 
see that 

(71.8) (3 XxY = {UxV:Ue(3 x ,Vel3 Y } 

is a base for the topology of X x Y. In particular, (3xxY has only finitely or 
countably many elements when fix, f3 Y have only finitely or countably many 
elements. In this case, it follows that every open set in X x Y is the union of 
finitely or countably many products of open subsets of X and Y. Otherwise, 
one can check that an open set in X x Y that is also c-compact is the union of 
finitely or countably many products of open subsets of X and Y. 
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72 Product spaces, 2 



Let / be a nonempty set, and suppose that for each i G I we have a topological 
space Xi. In practice, we shall be interested in sets / with only finitely or 
countably many elements. Let X — Yiiei -^i be the corresponding Cartesian 
product, equipped with the product topology. 

Suppose that ft is a base for the topology of Xi for each i G /, and let /? 
be the collection of subsets of X of the form Yiiei Ui, where Ui G ft for each 
i E I, and Ui = Xi for all but finitely many i. It is easy to check that f3 is a 
base for the product topology on X . If / has only finitely or countably many 
elements, and each ft has only finitely or countably many elements, then /3 has 
only finitely or countably many elements too. This follows from the fact that 
the Cartesian product of finitely many countable sets is countable when / has 
only finitely many elements. If / is a countably infinite set, then one can use the 
same argument for finite subsets of J, and apply this to an increasing sequence 
of finite subsets of / whose union is all of /. 

If Xi is Hausdorff for each i E I, then X is Hausdorff. If Xi is compact for 
each i £ I, then X is compact, by Tychonoff 's theorem. Of course, this is much 
more elementary when / has only finitely many elements. If / has only finitely 
or countably many elements and each Xi is metrizable, then X is mctrizable, 
and compactness can be handled in a simpler way using sequential compactness. 
This approach can also be applied directly when / has only finitely or countably 
many elements and there is a base for the topology of Xi with only finitely or 
countably many elements for each i G I, so that there is also a base for the 
topology of X with only finitely or countably many elements. 

Let / be a continuous real or complex- valued function on X . For each e > 
and x G X, there is an open set U(x) in X such that x G U{x) and 

(72.1) \f(y) - f(z)\ < e 

for every y, z G U{x). More precisely, we can take U{x) to be a basic open 
set in the product topology, so that there is a finite set I(x) C I such that 
U{x) = Y\ ieI Ui(x) for some open sets Ui(x) C X, where Ui{x) — Xi for every 
i G I\I(x). In particular, if y G U(x), z G X, and yi = Zi for each i G I(x), 
then it follows that z G U(x), and hence (172.11) holds. 

If Xi is compact for each i G /, so that X is compact, then there are finitely 
many elements x(l), . . . , x(n) of X such that 

n 

(72.2) X = |J U(x(n)). 

3=1 

Put I e = Uj=i I( X U))> so tlra* I t C I has only finitely many elements. If 
y, z G X satisfy yi — Zi for every i G I e , then it is easy to see that (|72.1[) holds. 
This is because y G U{x{j)) for some j — 1, ...,n, and so z G U(x(j)) too. 
Thus continuous functions on X may be approximated uniformly by functions 
of finitely many variables under these conditions. 
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Suppose that fa is a regular Borel probability measure on Xi for each i. If 
A C 7 is a nonempty set with only finitely many elements, then let La{J) be 
the function on X which is constant in Xi for each i £ A obtained by integrating 
/ in Xi with respect to fa for each i G A. If A fl I e = 0, then 

(72.3) IM/)(lO - /(»)| < e 

for every y E X, since (172. ip holds for every z G X such that j/j = Zi when 
i G 7\ A If A, B C J are finite sets such that i e C A, i3, then 

(72.4) \L A (f)(y)-L B (f)(y)\<2e 

for every y E X. This uses the previous estimate applied to A\B and to 
estimate the difference between each of La(J), Ls(f) and L,AnB{f)- 

Let ^4 be the collection of all finite subsets of I, ordered by inclusion. This 
is a directed system, because for every A, B G A we have that A U B G .4 
and i,BC iUB. If / is a continuous function on X, then one can think of 
{La{J)}a^a as a net of functions on X indexed by A. One can show that this 
net converges uniformly to a constant on X for every continuous function on 
X. This uses the fact that the net satisfies a uniform Cauchy condition on X, 
as in the previous paragraph. 

In the limit, we get a positive linear functional on the space of continuous 
functions on X which takes the value 1 on the constant function 1. The Riesz 
representation theorem implies that this linear functional can be expressed in 
terms of a unique regular Borel probability measure on X, which corresponds 
to the product of the fa's. As usual, the situation is especially nice when I is 
countably infinite, and each Xi has a base Pi for its topology with only finitely 
or countably many elements. This leads to a base j3 for the topology of X 
consisting of only finitely or countably many basic open sets in X, as before, 
which implies in particular that every open set in X is the union of finitely or 
countably many basic open sets. Otherwise, every open set in X that is also 
er-compact is the union of finitely or countably many basic open sets, as in the 
previous section. 

Part III 

Conditional expectation and 
martingales 

73 cr-Subalgebras 

Let (X, A, fi) be a probability space, and let B be a cr-subalgebra of A. Thus 
(X, B, fa) is also a probability space, where the measure /1 is restricted to B. If 
a real or complex-valued function / on X is measurable with respect to B, then 
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it is automatically measurable with respect to A as well. If / is measurable 
with respect to B and integrable with respect to //, then / is also integrable as 
a function which is measurable with respect to A, and the integral 

(73.1) [ fdfi 

Jx 

is the same with respect to both A and B. 

For example, B might consist of only the empty set and A itself, in which 
case the only functions on A that are measurable with respect to B are constant 
functions. As another example, one might take X to be the closed unit interval 
[0, 1], A to be the cr-algebra of Lebesgue measurable subsets of [0, 1], /i to be 
Lebesgue measure on [0, 1], and B to be the cr-algebra of Borel subsets of [0, 1]. 
It is well known that for each Lebesure measurable set A C [0,1] there are 
Borel sets B X ,B 2 C [0,1] such that B x C A C B 2 and n{B 2 \B 1 ) = 0. More 
precisely, one can take B\ to be a countable union of compact sets, and B 2 to 
be a countable intersection of relatively open sets in [0, 1]. 

Let {Xi^Ai, fix), (X 2 ,A 2 ,fi 2 ) be probability spaces, and let X — X\ x X2 
be their Cartesian product, with the corresponding product measure jjL\ x /i 2 
and cr-algebra A. Let B\ be the collection of subsets of X of the form E x A 2 
with E £ Ai, and let B 2 be the collection of subsets of X of the form X\ X E 
with E <E A 2 . It is easy to see that B\, B 2 are cr-subalgebras of A, and that 
a function f(xi,x 2 ) on X is measurable with respect to B\ or B 2 if and only 
if it is measurable with respect to A and constant in x 2 or xi, respectively. 
Thus measurable functions on X with respect to B\ , B 2 may be identified with 
functions on X\ , X 2 that are measurable with respect to A\ , A 2 , respectively. 

As a variant of this, suppose that X\, X 2 are topological spaces, and let 
X = X\ x X 2 be equipped with the product topology. If Ai C X\, A 2 C X 2 are 
Borel sets, then A\ x X 2 , X\ x A 2 are Borel sets in X, by standard reasoning. 
In particular, 

(73.2) Ax x A 2 = (Ax x X 2 ) n [X x x A 2 ) 

is a Borel set in A. At any rate, the collections of subsets of X of the form 
A\ x A 2 , X\ x A 2 , where At, A 2 are Borel subsets of A 1; A 2 , respectively, are 
cr-subalgebras of the Borel sets in A. As in Section [71] if there are bases for 
the topologies of X\, X 2 with only finitely or countably many elements, then 
every open set in X is the union of finitely or countably many products of open 
subsets of X\ and A2 . This implies that every open set in X is in the cr-algebra 
generated by products of Borel sets in X\, X 2 , and hence that every Borel set 
in X is in this cr-algebra. It follows that the cr-algebra of subsets of X generated 
by products of Borel sets in X\ , X 2 is the same as the cr-algebra of Borel sets 
in A under these conditions. 
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74 L p Spaces 



Let (X,A,n) be a probability space, and let B be a cr-subalgebra of A. li f, g 
are measurable functions on X with respect to A, then 



is a measurable set in X with respect to A. If /, g are measurable with respect 
to B, then (|74.1[) is measurable with respect to B. Of course, / and g are said 
to be equal almost everywhere with respect to /i when 



Let L P (X, A), L P (X, B) be the LP spaces of measurable functions on X with 
respect to A, B, for < p < oo. These spaces also involve the measure fx, but 
we omit this from the notation when it is unambiguous. Because measurable 
functions on X with respect to B are also measurable with respect to A, we get 
an isometric linear embedding of L P (X, B) into L P (X, A) for each p, < p < oo. 

Note that L P (X,B) corresponds to a closed linear subspace of L P (X,A) for 
each p, < p < oo. One way to see this is to use the completeness of L P (X,B) 
and the fact that the embedding into L P (X,A) is isometric. Basically the same 
argument can be given more explicitly as follows. Suppose that {fj}°°=i is a 
sequence of elements of L P (X, B) that converges in the L p norm to / £ L P (X, A). 
By passing to a subsequence, we may suppose that {fj} c *^ 1 converges pointwise 
almost everywhere to /. It is well known that the set of x G X such that 
{fj(x)}'j^ 1 converges in R or C, as appropriate, is measurable with respect to 
B, because each fj is measurable with respect to B. The complement of this set 
has measure by hypothesis, and we may suppose that {fj(x)}°^ 1 converges 
in R or C for every x G X, by setting fj(x) = on the set where the sequence 
does not converge initially. The limit is automatically measurable with respect 
to B, and equal to / almost everywhere. This shows that / is in the image of 



L p (X,B) in L p (X,A), as desired. 

75 Conditional expectation 

Let (X, A, jit) be a probability space, and let B be a cr-subalgebra of A. If 
/ G L X (X, A), then 



defines a real or complex measure on A, as appropriate. By construction, /if 
is absolutely continuous with respect to p. Hence the restriction of [if to B is 
absolutely continuous with respect to the restriction of fj, to B. The Radon- 
Nikodym theorem implies that there is a measurable function /g on X with 
respect to B which is integrable with respect to /x and satisfies 



(74.1) 



{x G X : f{x) = g(x)} 



(74.2) 



G X : f(x) + g(x)}) = 0. 



(75.1) 




(75.2) 
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for every B £ B. If f' B is another measurable function on X with respect to B 
which is integrable with respect to fi and satisfies 

(75.3) H(B)= f f'sdLL 

Jb 

for every B £ B, then it is easy to see that f' B — fg almost everywhere with 
respect to fi. Thus /g is uniquely determined as an element of B). This 

function /g is known as the conditional expectation of / with respect to B, and 
may be denoted E(f \ B). 

For example, if B = {0,X}, so that only constant functions are measurable 
with respect to B, then E(f \ B) reduces to the ordinary expectation 

(75.4) E(f)= [ fdn. 

If A = B, then /g = /. For any A, B, we can take / g = / when / is measurable 
with respect to B. 

Let (Xi,Ai, Hi), (X 2 ,A2,H2) be probability spaces, and let X = X\ x X 2 
with the product measure /i = fix x fi2 and corresponding er-algebra A. Also let 
B\, B2 be the a-subalgebras of A defined in Section[73l If f(x\,X2) £ L 1 (X,^4), 
then 

(75.5) fi(xi) = / f(xi,x 2 )dfi2(x2), 

Jx 2 

(75.6) 72(^2) = / f(xi,x 2 )dfi 1 (x 1 ) 

JXx 

are defined almost everywhere on Xi, X 2 , respectively, and determine integrable 
functions on these spaces, as in Fubini's theorem. In this case, 

(75.7) fBi{xi,x 2 ) = fi(xi), fB 2 (xi,x 2 ) = /2OE2) 

are measurable functions on X with respect to B\ , B2 , respectively, and satisfy 
the requirements of the conditional expectation, again by Fubini's theorem. 



76 Product spaces, 3 

Let Xi, X2 be compact Hausdorff topological spaces, and let X = X\ x X2 be 
their Cartesian product, with the product topology. Also let fj,\, /i 2 be regular 
Borel probability measures on X\, X2, respectively, which may be given by 
positive linear functionals on the spaces of continuous functions on X\ , X2 that 
take the value 1 on the constant functions equal to 1 on these spaces, by the 
Riesz representation theorem. If f(x\,X2) is a continuous function on X, then 

(76.1) /i(a;i) = / f(xi, x 2 ) ^2(^2), 72(^2)= / f(x 1 ,X2)dfi 1 (x 1 ) 
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are continuous functions on X%, X 2 , respectively, by the uniform continuity 
properties of f(xi,X2) in each variable separately discussed in Section [7TJ In 
addition, 

(76.2) / fi(xi) d/ii(xi) = / /a (£2) dfi 2 (x 2 ) 

<J X\ J X2 

defines a positive linear functional on the space of continuous functions on X 
that takes the value 1 on the constant 1, and hence determines a regular Borel 
probability measure fi on X by the Riesz representation theorem, as in Section 
I7T1 again. 

In this context, one can think of /if as the regular Borel measure on X 
determined by 

(76.3) <f>^ f </>fdn, 

Jx 

as a bounded linear functional on the space of continuous functions on X. If -0 
is a continuous function on X\, which can also be considered as a continuous 
function on X that is constant in x 2 , then this linear functional applied to 

(j)(x\,X2) = ipixi) reduces to 

(76.4) f i/,f 1 dm= [ iPfidfx. 

JXi Jx 

Of course, there is an analogous statement for continuous functions on X 2 . In 
this way, conditional expectation can be expressed more directly in terms of 
linear functionals on continuous functions. 



77 Measurable partitions 

Let (X, A, fj) be a probability space, and let V be a partition of X consisting 
of finitely or countably many measurable subsets of X. Thus the elements of 
V are pairwise-disjoint measurable subsets of X whose union is all of X. Let 
B = B(V) be the collection of subsets of X that can be expressed as unions of 
elements of V, including the empty set. It is easy to see that B is a c-subalgebra 
of A, and that a function / on X is measurable with respect to V if and only if 
/ is constant on each of the elements of V. 
If / € L X (X, A), then one can check that 

(77.1) f B ( x ) = J- [ fdfx 

when x E A E V and fJ,(A) > 0. Let us ask that fi(A) > for every A e V , for 
the sake of simplicity. Thus /g(x) is defined for every iGXby this expression, 
and is constant on elements of V, and hence is measurable with respect to B. 

If v is a real or complex measure on A, then the restriction of v to a a- 
subalgebra B of A may be absolutely continuous with respect to the restriction 
of /1 to B, even if v is not absolutely continuous with respect to /i on A. In this 
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case, the Radon-Nikodym theorem implies that there is a unique fj$ € L X (X, B) 
such that 

(77.2) u(B) = f f B dv 

■IB 

for every B e B, as before. If B = B(V) and fi(A) > for every A € V, then 
any measure on B is absolutely continuous with respect to the restriction of [i 
to B. As in the previous situation, 

< 77 - 3 > «*> = m 

for every 

78 Basic properties 

Let (X, B, n) be a measure space, and let g be a real-valued integrable function 
on X. If 

(78.1) /.9o^>0 

for every B € B, then g > almost everywhere on X. To see this, put 

(78.2) B = {xeX :g a (x)<0}. 
If n(B ) > 0, then it follows that 

(78.3) f godfKO, 

J B 



a contradiction. 

Suppose now that g is a real or complex- valued integrable function on X, 
and that h is a nonnegative real- valued integrable function on X such that 



(78.4) 



gdfj, 



B 



< hdfj, 



B 



for every B e B. We would like to check that \g\ < h almost everywhere on 
X under these conditions. If g is real-valued, then we can apply the previous 
argument to h±g, to get that h±g > almost everywhere on X. If g is complex- 
valued, then the same argument shows that Re a g < h almost everywhere on 
X for every a <G C with |a| = 1. This implies that \g\ < h almost everywhere, 
by using a countable dense set of a's in the unit circle. 

Now let (X, A, [i) be a probability space, and let 8bea er-subalgebra of A. 
If / G L 1 (X, A) is real- valued and nonnegative, then 

(78.5) / f B d»= [ fdv>0 

J B J B 
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for every B £ B. This implies that /g > almost everywhere on X, by the 
argument at the beginning of the section. Of course, it is important here that 
f B is also measurable with respect to B. Similarly, if / > almost everywhere 
on X, then 

(78.6) f f B dn= f fdn>0 

J B J B 

for every B £ B with /i(B) > 0, and one can use this to show that /g > 
almost everywhere on X. 

If / is any integrable function on X that is measurable with respect to A, 
then we can apply the preceding observation to |/| to get that 

(78.7) |/| B = £(|/||S)>0 
almost everywhere on X . Moreover, 



(78. 



B 



fdn 



B 



< 



|/|dM= / \f\Bdfi 
Jb 



for every B £ B, which implies that 

(78.9) \f B \ < 

almost everywhere on X, by the earlier remarks. As before, it is important here 
that both f B and |/|g are measurable with respect to B, to apply the arguments 
at the beginning of the section. In particular, 

(78.10) f \f B \dfx< f |/M/i= / \f\dfi, 
Jx Jx Jx 

using the fact that X £ B in the last step. 

Alternatively, let v be a real or complex measure on A, and let |^| be the 
corresponding total variation measure on A. Also let vb be the restriction of v 
to B, and let be its total variation, as a measure on B. It is easy to see that 

(78.11) \ VB \(B) < W\(B) 

for every B £ B, so that |^g| is less than or equal to the restriction of |^| to B. 
If / £ L 1 (A,^l) and /z/ is as in (|75.1[) . then one can show that |/z/| = //iji. This 
gives another way to look at (|78.9p . since the restriction of Hf to B is given by 
integrating f B . 

Note that / H> f B defines a linear mapping from L X (X, A) into ^(X, B), 
because of the uniqueness of the conditional expectation. More precisely, this 
mapping sends L 1 (X, A) onto L X {X, B), because /g = / when / is measurable 
with respect to B. If /, /' are real-valued integrable functions on X that are 
measurable with respect to A and satisfy / < /' almost everywhere on X, then 

(78.12) f B < f B 
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almost everywhere on X. This follows from the linearity of the conditional 
expectation and the fact that /'— / > almost everywhere, so that (/' — /)s > 
almost everywhere on X. If / is a real or complex- valued integrable function 
on X and /' is a nonnegative real-valued integrable function on X such that 
I /I < /' almost everywhere, then we get that 

(78.13) \f B \ < \f\ B < f' B 

almost everywhere on X. In particular, this holds when /' is a constant, in 
which case f' B is the same constant. This implies that /g £ L°°(X,B) when 
f£L°°(X,A), with 

(78.14) ||/b||oc < ll/lloo- 

Let / be a real-valued integrable function on X that is measurable with 
respect to A and takes values in an interval ICR almost everywhere. This 
interval may be open, closed, or half-open and half-closed, and it may also be 
unbounded, such as a half-line or the whole real line. One can check that /g 
takes values in / almost everywhere as well, by comparing / with constant 
functions. If <f> : I — > R is convex and <fi o / is integrable on X, then Jensen's 
inequality implies that 



(78.15) 4-^— f < -L - f <{>ofdii 

\ti(A) J A J n{A) J A 

for every A E A with /i(A) > 0. Hence 

(78.16) J-^- f f B d^)<^—f (0o/) B d/i 

for every B G B with ^i(B) > 0, because these averages can be reduced to those 
in (|78.15[) . Using this, one can check that 

(78.17) <K/b)<(<W)b 

almost everywhere on X. More precisely, one can apply the previous inequality 
for averages to sets B £ B where /e, (</>o /)g are approximately constant. 

Of course, <fi(t) = \t\ p is a convex function on the real line when 1 < p < oo. 
If / £ L P (X,A) is real- valued, then we get that 

(78.18) \h\ p < (\f\ p h = E(\f\r \ B) 

almost everywhere on X, as in the previous paragraph. If / is complex- valued, 
then one can apply this to | / 1 , to get that 

(78.19) \f B \ p < (\f\ B ) p < (\f\ p h, 
using (|78.9[) in the first step. It follows that 

(78.20) / \h\ p dfi< [ (\f\ p ) B dp= f \f\*dn, 
Jx Jx Jx 
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because X £ B, and that fg £ L P (X,B) in particular. Equivalently, 

(78.21) H/bIIp < ll/llp, 

which also holds when p — oo, as in (|78. 14|) . 

Remember that 1e (x) denotes the indicator function of a set E C X, equal 
to 1 when x £ E and to when x £ X\E. If / £ ^{X.A) and A, E £ A, then 
of course 



(78.22) fl E d» = fdfx. 

J A J AnE 

If B, E £ B, then B H E £ B, and 

(78.23) f (fl E ) B dfi = [ fl E dfx= [ fd» 
JB Jb J BnE 

f B dfi= / f B 1 E d[i. 



B(~lE JB 

This implies that 

(78.24) (/1b)b = /b1b, 

since Jb^-e is measurable with respect to B. Similarly, if g £ L°°(X,B), then 

(78.25) (fg)B = fB9. 

This follows from the previous statement by approximating g by simple functions 
that are measurable with respect to B. If / £ L P (X, A), 1 < p < oo, then (|78.25l) 
also works for g £ L q (X, B), where 1/p + 1/q = 1, by the same argument. 
Note that /g = almost everywhere on X if and only if 



(78.26) / fdn = 

Jb 

for every B £ B. If / £ L P (X,A), 1 < p < oo, then this implies that 

(78.27) / fgd(i = 



Jx 

for every g £ L q (X, B), where 1/p+l/q = 1 again. This uses the fact that simple 
functions are dense in L q (X,B). If p = 2, then the collection of / £ L 2 (X, A) 
such that /g = is the same as the orthogonal complement of L 2 (X, B) as 
a linear subspace of L 2 (X, A) , and / n- f B is the same as the orthogonal 
projection of L 2 (X,A) onto L 2 (X,B). 

Suppose now that £>i, Bi are a-subalgebras of A, with Ki C 6 2 . If / is an 
integrable function on X with respect to A, then 

(78.28) (/b 3 ) Bi =/ Bi . 
To see this, let B £ B\ be given, and observe that 

(78.29) / (/s a )s 1 dM= / fB 2 d^= f fd»= f f Bl dfx, 

JB JB JB JB 

because B £ B2 as well. This corresponds to the fact that restricting a measure 
v on A to B\ is the same as restricting v to B2, and then to Bi- 
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79 Distances between measurable sets 

Remember that the symmetric difference A A B of two sets A, B is defined by 

(79.1) A A B = (A\B) U (B\A). 
If C is another set, then it is easy to see that 

(79.2) AAC C (AAB)U(B AC). 

Let (X, A, n) be a probability space, and define d(A, B) for A, B £ A by 

(79.3) B) = fi(A A B). 
Thus d(A, A) = 0, S) = A) > 0, and 

(79.4) d(A, C) < d(A, B) + d(B, C) 

for every A,B,C £ A, by (|79.2|) . This shows that d(A, B) is a semimctric on 
A, which means that it satisfies all of the requirements of a metric, except that 
d(A, B) = may not imply that A — B. In this case, d(A, B) = when A and 
B are the same up to sets of measure 0. Equivalently, d(A,B) is equal to the 
distance between the indicator functions 1^, 1b in L . 

Observe that (X\A) A (X\B) = A A B for every A,BCX, and hence 

(79.5) d(X\A, X\B) = d(A, B) 
when A,B £ A. Moreover, 

(79.6) (Ax U A 2 ) A (Bi U B 2 ) 

= ((A, U A 2 )\(B 1 U B 2 )) U ((B, U B 3 )\(Ai U A a )) 

= (Ai\(Bi U B 2 )) U (A 2 \(S 1 U fl 2 )) U {Bx\{A\ U A 2 )) U (B 2 \(Ax U A 2 )) 
C U (A 2 \B 2 ) U U (B 2 \A 2 ) 

= (Ax A Bx) U [A 2 A B 2 ) 

for every A ll A 2l B l ,B 2 C X. Therefore 

(79.7) d(Ax U A 2 , Si U B 2 ) < d{Ax,Bx) + d(A 2 ,B 2 ) 
when Ax, A 2 , Bx, B 2 £ A. This implies that 

(79.8) d(Ax n A 2 ,Bx D fl a ) < ^i) + d(A 2 , fl 2 ) 
for every Ax,A 2 ,Bx,B 2 £ A, because 

(79.9) X\(AxnA 2 ) = {X\Ax)U(X\A 2 ), 

and similarly for X\(B X n B 2 ). This also uses (I79.5|) applied to Ax D A 2 , Bx n B 2 
instead of A, i?, and then to Ai, i?i and A 2 , B 2 . 



If A\ C A2 C • • • is an increasing sequence of measurable subsets of X, then 
{Aj}°Z 1 converges to their union UjLi Aj with respect to d(A, B), in the sense 
that 

00 

(79.10) lim d(A n , I J A A = 0. 

To see this, note that A n C UjLi A? f° r eacn n > so that 

00 00 00 

(79.11) A n A ( |J A,) = ( |J Aj)\A> = |J (A j+1 \Aj)- 

j=l j=l j=n 

Hence 

00 00 

(79.12) d(A n , (J A,) - £ n{A j+1 \Aj). 

j=l j=n 

Of course, the sets Aj + \\Aj are pairwise disjoint, and so Y^jLi MAh-AA?) 
converges, by countable additivity. This implies that 

00 

(79.13) lim V fj,(A j+1 \Aj) = 0, 

as desired. Similarly, ii B± D B% D • • • is a, decreasing sequence of measurable 
sets, then {Bj}^ l converges to f]JLi Bj with respect to d(A, B), in the sense 
that 



(79.14) lim d(B n , p| Bj) 

n— s-oo \ 1 1 / 



0. 



This follows from the previous case applied to Aj = X\Bj. 
Let {Aj}°Z 1 be a sequence of subsets of X, and put 

00 00 

(79.15) B k =\jA j , C l = f]A j 

j=k j=l 

for each k, I > 1. Thus 

(79.16) B k+1 C B fe , Cj C Cj+i, and C fe C B fc 

for each k, I. The upper and lower limits of {Aj}°^ 1 are the subsets of X defined 
by 

CO 00 

(79.17) lim sup A,- = C] B k , lim inf Aj = 11 Cj. 



7— >oo 

fc=l (=1 



In particular, 



(79.18) lim inf Aj C lim sup Aj . 
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Suppose that Aj £ A for each j, so that Bk, Ci £ A for every k, I, and hence 

(79. 19) lim sup Aj , lim inf Aj £ A. 

Because of monotonicity, 

(79.20) lim fi(B k ) = /z( limsupAA lim /u(C/) = liminf Aj) . 

fc— >oo V >oo / Z— >oo V J— >oo / 

It follows that 

(79.21) n[ lim sup A,; = /i I lim inf ) 
if and only if 

(79.22) lim n(B n \C n ) = 0. 

n— »oc 

If this condition holds and A £ A satisfies 

(79.23) lim inf Aj £ A £ lim sup Aj , 

then it is easy to see that 

(79.24) lim d{A n ,A) = 0. 

n—>oo 

More precisely, 

(79.25) A n A A = (A„\A) U (A\A„) £ (B„\A) U (A\C n ) = B n \C n , 
and so 

(79.26) d(A n , A) < fj,(B n \C n ) as n ^ oo. 

Let us check that (|79.22|) holds when Y^jLi d(Aj+i, Aj) converges. The main 
point is that 

oo oo 

(79.27) B n \A n £ \J (Aj +1 \Aj), A n \C n C |J {Aj\A j+1 ) 

j=n j=n 

for each n. More precisely, if x £ B n \A n , then x £ Aj + i for some j > n + 1, 
and x (jL A n . If j is the smallest integer such that j > n and x £ Aj+x, then 
x ^ Aj, and so x £ Aj±%\Aj, as desired. Similarly, if y £ A n \C n , then y ^ Aj + i 
for some j > n. If j is the smallest integer such that j > n and y g" Aj+i, then 
y € Aj, and so y £ Aj\A j+ i. This proves (|79.27[) . 
It follows that 

oo oo 

(79.28) »(B n \A n ) < li(A j+1 \Aj), n(A n \C n ) < £ »(Aj\A j+1 ) 

j=n j=n 

for each n. Hence 

oo 

(79.29) fi(B n \C n ) = n(B n \A n ) + n(A n \C n ) <^d(A j+l ,Aj), 

j=n 
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using the fact that C n C A n C £?„ in the first step. If Y^jLi d{Aj + \,Aj) 
converges, then the right side tends to as n — > oo, and so (|79.22|) holds. This 
implies that there is an A £ A such that lirrin^oo d(A n , A) = 0, by the earlier 
remarks. If instead {Aj}JL 1 satisfies the Cauchy condition 

(79.30) lim d{A h A{) = 0, 

3,1— »oo 

then there is a subsequence {Aj„}«^i of {Aj'}^=i such that J2^Li d(Aj n+1 , Aj n ) 
converges. This implies that there is an A € A such that lim JWOO d(Aj n ,A) =0, 
as before. Using the Cauchy condition, one can check that lirrij^oo d(Aj, A) = 0. 

If £ C A, then let £ be the collection of A £ A such that for each e > 
there is an E £ A that satisfies d{A, E) < e. This is basically the same as the 
closure of a set in a metric space, except that d(A, B) is only a semimctric. In 
particular, note that £ automatically contains every A £ A for which there is 
an E € £ such that d(A, E) = 0. As in the context of metric spaces, one can 
check that _ 

(79.31) £ = £. 

If £ is a subalgebra of A, then it is easy to see that £ is also a subalgebra 
of A, using the properties of the distance related to unions, intersections, and 
complements discussed earlier in this section. 

Let us check that £ is actually a ex-algebra when £ is an algebra. It suffices 
to show that Ujli A? e ^ f° r ever y sequence A\, A 2 , . . . of elements of £. Of 
course, Uj=i A; e ^ f° r cacn n -> because £ is an algebra. We also know that 
Uj=i Af converges to IJ^Li A? as n — > oo with respect to d(A, B), because of 
monotonicity. It follows that UJLi Aj £ £, by combining these two facts. 

If A € £ , then there is a sequence {Aj}J^ 1 of elements of £ such that 
Sjli d(Aj , A) converges. This implies that Y^jLi d{Aj +1 , Aj ) converges, by the 
triangle inequality. Thus {Aj}J^ 1 converges to limsup^^ Aj, liminf^oo Aj 
with respect to d(A, B), by the earlier discussion, and A differs from these limits 
by sets of measure 0. In particular, A £ £ when £ is a cr-subalgebra of A that 
contains all elements of A with measure 0. It follows that £ = £ when £ is a 
cr-subalgebra of A that contains the sets of measure 0. 



80 Sequences of cr-subalgebras 

Let (X, A, fj.) be a probability space, and let B\ C B% C • • ■ be an increasing 
sequence of cr-subalgebras of A. Thus £ — Ujli &j * s a subalgebra of A, but 
not necessarily a cr-subalgebra. If C = £ is the closure of £ with respect to the 
semimctric d(A, B), then C is the smallest cr-subalgebra of A that contains £ 
and the sets of measure 0, as in the previous section. 
Put 

(80.1) /„ - f Bn - E(f | B n ) 
for each / £ I/ 1 (X, A) and n > 1, and 

(80.2) f ao = f c = E(f\C). 
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Note that 

(80.3) /„ = E(U | B n ) 

for each n, since B n C C. If / G L P (X, A) for some p, 1 < p < oo, then 
/„ G L P (A, B n ) for each n, foo G £p(A,C), and 

(80.4) ||/«||p < ||/oo|| P < ||/||p. 

If / happens to be measurable with respect to Bi for some Z > 1, then 

(80.5) /„ = /oo = / 

for every n > I. 

If 1 < p < oo, then 

oo 

(80.6) \Jlf(X,Bi) 

is dense in L P (X,C). To see this, one can first approximate elements of L P (X,C) 
by simple functions that are measurable with respect to C. The latter can then 
be approximated by simple functions that are measurable with respect to Bi for 
some I, using the definition of C. This implies that 

(80.7) lim /„ = /oo 

n—^oo 

in the LP norm when / G L P (X, A), 1 < p < oo. More precisely, one may as 
well take / = /oo, so that / is already measurable with respect to C. If / is 
measurable with respect to Bi for some I, then one can apply (180. 5[) . Otherwise, 
one can approximate / by g G L p (X,Bi) for some I, by previous remarks about 
density in L P (X,C). The main point is that /„ is also approximated by g when 
n> I, uniformly in n, because of ([80.4j) . 

Suppose that Xi, X2, ■ ■ ■ is a sequence of compact Hausdorff spaces, and 
that X = Iljli -^j is their Cartesian product, with the product topology. Let 
fij be a regular Borel probability measure on Xj for each j, and let fi be the 
corresponding product measure on X. Also let B n be the collection of subsets 
of X of the form B x Ilj!ln+i Xj, where B is a Borel set in YYj=i Xj- If / is 
a continuous real or complex- valued function on X, then /„ is the function of 
X\, . . . , x n obtained by integrating / in the variables Xj for j > n + 1. In this 
case, {f n }^Li converges to / uniformly on X, because of the uniform continuity 
properties discussed in Section [721 

81 Martingales 

Let (X, A, p) be a probability space, and let B\ C B% C • • ■ be an increasing 
sequence of cr-subalgebras of A, also known as a filtration. A sequence {fj}°^ 1 
of functions on X is said to be a martingale with respect to this filtration if 
fj G L x {X,Bj) for each j, and 



(81.1) fj = E{fi I B 



.1 > 
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when 1 < j < I. In particular, this implies that 
(81-2) ll/,lli<ll/ ; ||i 

for each j < I. If fj e L p (X,Bj) for some p, 1 < p < oo, and every j, then 
(81-3) \\fj\\ P < 

for each j < I. If / e L 1 ^^) and /j = s (/ I S j) for each then {/?}>=i is a 
martingale. 

Let (X\,Ai,fi\), (X2,A2,[i2), ■ ■ ■ be a sequence of probability spaces, and 
let X — rijli Xj be their Cartesian product, with the product measure /i on 
the corresponding er-algcbra A. Also let B n be the collection of subsets of X 
of the form B x Jl^ln+i -^j'' wriCrc -B is a measurable subset of rij=i -^j- This 
defines an increasing sequence of d-subalgebras of A. Let a,j be an integrable 
function on Xj such that 

(81.4) / ajdnj=Q 

Jx, 

for each j, which can also be considered as an integrable function on X that 
does not depend on xi when j ^ I. In this case, 

n 

(81.5) /n = £> 

defines a martingale with respect to this filtration. 

Let (X, A, n) be any probability space again, with an increasing sequence 
Bj of a- algebras of A. Also let {fj}j°-i be a martingale with respect to this 
filtration, with fj <G L 2 (X,Bj) for each j. Thus 



(81.6) / fjdn= / f i+1 dfi 

Jb Jb 

for each B £ Bj, which implies that 



(81.7) / bfjdfi = / bf i+ xdn 

Jx Jx 

-2 



for every b € L 2 (X,Bj). Equivalcntly, 



(81.8) / b(f j -f j+1 )dfi = 

Jx 

for every b e L 2 (X,Bj). It follows that the functions f\ and /j+i — fj, j > 1, 
are all orthogonal to each other in L 2 (X,A). 
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82 L p Boundedness 



Let (X, A, n) be a probability space, and let B\ C B2 C • • ■ be an increasing 
sequence of er-subalgebras of A. As before, put £ = {J < *L 1 Bj, an d let C = £ 
be the closure of £ with respect to the semimetric d(A, B) . Let 1 < p < 00 be 
given, and let be a martingale on X with respect to the Bj's such that 

fj £ L p (X,Bj) for each j, and the £ p norms ||/j|| p are uniformly bounded. 
If B £ Bi for some I, then 



(82.1) fidn= fndfi 

J B J B 

when n > I. This implies that 

(82.2) f figdfx= [ f n gdn 

J x Jx 

when g £ L q (X,Bi), where l/p+ 1/q = 1. In particular, 

(82.3) lim / /„ 5 d M 

exists for every g £ L q (X,Bi), I > 1. Note that (J^j L 9 (J,Bi) is dense in 
L q (X,C), as in Section 150) because 1 < g < 00. It follows that the limit (|82.3|) 
exists for every g £ L q (X,C), using also the uniform boundedness of the L p 
norms of the fj's, as in Section [52] 

More precisely, f|82 . 3[) defines a bounded linear functional on L q (X,C) under 
these conditions. The Riesz representation theorem implies that there is an 
f £ LP(X,C) such that 

(82.4) lim / f n gdfj,= fgdfi 

n ^°°Jx Jx 

for every g £ L q (X,C) under these conditions. If g £ L q (X, B{) for some I, then 
we get that 



(82.5) / figdfi= I fgdfi. 

Jx Jx 

In particular, 

(82.6) / f l dn= f fdn 

J B Jx 

for each B £ Bi, which implies that 

(82.7) /, = E(f I Bi) 
for each /. 

If 1 < p < 00, then it follows that {/z}^ converges to / in the LP norm, as 
in Section 1801 lip = 2, then 



n-l 

(82.8) ll/n||! = ll/i|l! + EH£+i-/ill2 



3=1 
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for each n, because of orthogonality, as in the previous section. The boundedness 
of the L 2 norms ||/ n ||2 is equivalent to the convergence of the series 

oo 

(82.9) £l|/i+i-/ill2, 

i=i 

which implies the convergence of the series Yl'jLiifj+i ~ fj) m L 2 (X,C). This 
gives a more direct proof of the convergence of {fj}f^i in L 2 (X,C) in this case. 
Of course, if {fj}° c L 1 is a martingale such that fj £ L p (X,Bi) converges to 
/ e L'P(X,A) in the LP norm for any p, 1 < p < oo, then / € LP(X,C) and 
ft = E(f | Bi) for each /, for basically the same reasons as before. 

83 Uniform integrability 

Let (X, A, [i) be a probability space, and let B\ C B<i C • • • be an increasing 
sequence of cr-subalgebras of A. Also let {fj}J^ 1 be a martingale with respect 
to this filtration with bounded L 1 norms, so that there is a C > such that 

(83.1) Il/«IK<C 

for each n. Note that this holds automatically when fj > for each j, because 

(83.2) ||/ 3 || 1= / fjdn= f fxdf, 

J x J x 

for each j > 1 in this case. 

Suppose that the fj's are uniformly integrable as well, in the sense that for 
each e > there is a S > such that 

(83.3) / \f n \d^<e 

J A 

for every A g A with fi(A) < S and every n > 1. It is well known that this 
condition holds automatically for a single integrable function, by approximating 
that function by bounded functions in the L 1 norm, for instance. Similarly, any 
finite collection of integrable functions has this property. Using this, it is easy 
to check that a sequence of integrable functions that converges in the L 1 norm 
is uniformly integrable. If there is a p > 1 such that /„ £ LP for each n and 
H/nllp is uniformly bounded, then {f n }^Li is uniformly integrable, because of 
Holder's inequality. 

If {fn)n=i satisfies (pTj) . then 

(83.4) »({xeX :\f n (x)\ > t}) < T 1 C 

for each t > 0, by Tchebychev's inequality. If {fj}j^Li is uniformly integrable 
too, then it follows that 

(83.5) / \f n (x)\dp(x) -> as t oo, 

J{xeX:|/„(x)|>t} 
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uniformly in n. Conversely, the latter condition implies that {f n }^ = i has 
bounded L 1 norms and is uniformly integrable. 

As usual, put £ — Ujli an d let C = £ be the closure of £ with respect 
to the semimetric d(A,B). Note that 

(83.6) f f n dn= f ftdfi 

J A J A 

for every A £ B\ and n> I. We would like to show that 

(83.7) ( / f n dfi 



is a Cauchy sequence in R or C, as appropriate, for every A e C, and hence 
converges. This is obvious when A e £, and one can deal with A e C by 
approximation, using uniform integrability. The main point is that 

(83.8) f fndfi, neZ+, 

J A 

is an equicontinuous family of functions of A S A with respect to the semimetric 
d(A, B), since 

(83.9) 

for every A,B e A. 
Put 



/ f„d[i- / /„dyu < / \fn\dfj, 

J A JB J AAB 



(83.10) v{A) = lim / /„ 

n ^°°JA 



for each A e C. Uniform integrability implies that for each e > there is a 
S > such that 

(83.11) \v(A)\<e 

for every A £C such that /u(^4) < 5. This follows by taking the limit as n — > oo 
in the definition of uniform integrability of {f n }%Li, using the same S as before. 

Clearly ^(A) is finitely additive on C, and countable additivity follows from 
this continuity condition. For if Ai,A 2 ,... is a sequence of pairwise-disjoint 
subsets of X in C, then countable additivity of [i implies that 

oo 

(83.12) lim J II Aj) = 0, 

j=k+l 

and hence 

oo 

(83.13) lim v{ I J Aj) = 

fc— >oo V ^-^ / 

j=fc+l 

too, by the continuity condition. Because of finite additivity, we also have that 

k 

K ,,( A .\ J. ,,( i ! ^ 



oo k; oo 

(83.14) "(U^^E^ + K U ^ 



i=i j=i i=fc+i 
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for each k > 1. It follows that YlJLi v iAj) converges to v([JJL 1 AjJ > as desired. 

Thus is a countably-additive real or complex measure on C, as appropriate. 
Moreover, v is absolutely continuous with respect to the restriction of /x to C. 
The Radon-Nikodym theorem implies that there is an / 6 L 1 (X,C) such that 



(83.15) v{A) = I fdfj, 

for every A £ C. In particular, 



(83.16) / /,dA*= / /<f/* 

when A € Bi, which implies that 

(83.17) fi = E(J | BO 

for each I, Conversely, this implies that {fi}fli converges to / in the L 1 norm, 
as in Section|80l which implies that {fi}fZi is uniformly integrable. 

84 Maximal functions, 3 

Let (X, A, fi) be a probability space, let B\ C B2 C • • • be an increasing sequence 
of er-subalgebras of A, and let {fj}JLi be a martingale on X with respect to 
this filtration. Consider the maximal functions 

(84.1) f*(x) = max \fj{ x )\ 

l<3<n 

and 

(84.2) r(x) = sup\f j (x)\. 

Note that /* is measurable with respect to B n , and that 

(84.3) f*(x) = lim f*(x) 

n—too 

is measurable with respect to the smallest cr-algebra B^ that contains £ = 
\SjLiBj. If X = [0,1), /i is Lebesgue measure, and Bj consists of unions of 
dyadic intervals of length 2~ J , then this is a variant of the dyadic maximal 
function, as in Section [58] 
Put 

(84.4) E(t) = {xeX : f*(x)>t} 
for each t > 0, as well as 

(84.5) EkQ) = {x € X : \ f!(x)\ > t} 
and 

(84.6) E t (t) = {xeX: \f,(x)\ > t, fii^x) < t} 
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when I > 2. Thus Ei(t) G Bi for each I, t, E t (t) n = when I < n, and 

OO 

(84.7) E(t) = \jE l (t). 

i=i 

Similarly, 

n 

(84.8) [J Et(t) = {x e X : f*{x) > t} 
i=i 

for each n > 1. If / < n, then 

(84.9) */*(£j(t)) < / I/;Mm< / \U\dn, 

JE,(t) JE,(t) 

because ft = E(f n \ B\) and hence \ft\ < E(\f n \ \ Bi), as in Section [751 This 
implies that 

n n n p 

(84.10) ^(U^(*)) < E/ 

i=i (=1 i=i jE i( t ) 

\fn\dfJ,. 



Suppose now that the / n 's have bounded i 1 norms, so that 

(84.11) H/n||l<C 

for some C > and every n > 1. The previous estimate implies that 

n 

(84.12) £/i(|J <C 

1=1 

for each n. Hence 

oo 

(84.13) tn{E{t)) = tJ \J £,(*)) < 



2=1 

This is basically the same as the estimates in Sections 135] an d EFJ except that 
the measure /i here corresponds to Lebesgue measure before, and the martingale 
{fj}^=i corresponds to the measure jU or function / before. The martingale may 
be generated by a function or measure on X, through conditional expectation. 
We also have that 

(84.14) yj / \ft\dfi < ]T / 



I \fn\d(l<C 
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for each n, since \fi\ < E(\f n \ \ £>;) when I < n. Hence 

oo „ 

(84.15) J2 \fi\dfi<C. 

1=1 J E,(t) 

This shows that the function h defined on X by h = fi on Ei(t), h = on 
X\E(t), is integrable, with \\h\\i < C. 

85 Convergence almost everywhere 

Let (X, A, [i) be a probability space, let B\ C B 2 C • • • be an increasing sequence 
of er-subalgebras of A, and let {fj}°^ 1 be a martingale on X with respect to 
this filtration. Observe that 

(85.1) {/*-/»}£=, 

is a martingale with respect to the filtration £>; C Bi+\ C • • • for each / > 1. Put 

(85.2) A t (t) = (i£l: sup|/„(z) - /,(a;)| > t 

L n>Z 

for every I > 1 and i > 0. 
If || f n ||i is bounded, then 

(85.3) tfi(Mt)) <sup||/ n -/i||i 

n>i 

for every i > 0, as in the previous section. This implies that 

oo 

(85.4) * M (rW*)) < inf (sup - /llli^ 
for each t > 0, and hence 

oo 

' ~ 



(85.5) v(f)Mt)] 

i=i 

for every t > when {/ n }^Li is a Cauchy sequence in L X (X, A). Thus 

oo oo 

(85.6) J{Jf)Ml/k))=0. 



fe=i i=i 

Of course, 

oo oo oo oo 

(85.7) x\( u n^(iA)) = n U( x \^(i/fc))- 

k=n=i fc=ii=i 

If x is in this set, then it is easy to see that {fn(x)}^—i is a Cauchy sequence 
in R or C, as appropriate. It follows that {f n }^Li converges pointwise almost 
everywhere on X when it converges in the L 1 norm. As in Section 1801 this 
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happens when there is an / e ^(X, A) such that /„ = E(f \ B n ) for each n. 
In particular, this happens when {fn}n°=i is uniformly integrable, as in Section 
1551 This includes the case where there is a p > 1 such that /„ G L p (X,B n ) for 
each n and is bounded, as in Section [521 

Suppose that we simply know that ||/n||i is uniformly bounded in n. Let 
t > be given, and put g\ = /i, and 

n-l 

(85.8) ft,(ar) = / n (ar) when x G X\( |J £?,(*)) 

i=i 

= fi (x) when x G -E 1 / (t) , 1 < i < n — 1 

for ?i > 2, where S;(t) is as in the previous section. Note that g n is measurable 
with respect to B n for each n, because Ei(i) G Bi C S„ when Z < n, as in the 
previous section, and /; is measure with respect to Bi and hence B n when I < n. 
Moreover, 

(85.9) / \g n \dn = f , + \fi\dfi 
Jx Jx \[{J" = i E '^) i=i J Mt) 



< 



*\(ur>«) 



n — 1 „ 

\f n \d» + J2 \fn\dfl 
1 = 1 J Ei(t) 



when n > 2, using the fact that |/;| < E?(|/ n | | 23/ ) in the second step. This 
implies that 

(85.10) f \g n \d^< f \f n \dfi, 

Jx Jx 

which obviously holds when n — 1 as well. 

Let us check that {g n }n°=i ^ s a martingale on X with respect to the B n 's. It 
suffices to show that 

(85.11) / g n dfi= / g n +idfi 



for each A G B n and n > 1, so that n = £%„+i | B„). If A C X\f U"=i ^(*)J) 
then g„ = /„ and g n+1 = f n +i on A, and so 

(85.12) / g n dfi= / f n dfi= / f n+1 d^= / g n+1 d^i. 
J A J A J A J A 

This uses the facts that /„ = E(f n+ i \ B n ) and A € B n in the middle step. If 
A C then g n = /„ on A because A C X\(\J"~^ Ei(t)j , and g n+ i = f n 

on A by definition of g n +i- Hence 

(85.13) / g n d(J,= / f n d/J,= / g n+ \d^. 
J a J a J A 
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Similarly, if A C Ei(t) for some I = 1, . . . ,n — 1, then g„ = 5,1+1 = // on A, and 



St) 



(85.14) / g n dfi= / fid(J,= / g n+ id(j,. 
J a J a J A 

Every A £ B n can be expressed as the disjoint union of its intersections with 
X\(U?=i Eli*)) and Ei(t), 1 < l < n, each of which is in B n . Thus (jgSTTTj) 
follows by combining the previous cases. 

Now let us check that {g n }^Li is uniformly integrable. Let h be the function 
on X defined by h = f n on E n (t) and h = on X\E(t), as in the previous 

section. Observe that g n = h on U"=i Ei(t), while g n — f n on ^\^U"=i Ei(t)j . 
Moreover, 

(85.15) |0»| = |/n|<* 

on X\(^\J r l 1 =1 Ei(t)j , by definition of Ei(t). This implies that 

(85.16) \g n \ <max(\h\,t) 

on X for each n, so that the uniform integrability of {g n }^Li follows from the 
integrability of h. 

Thus {g n }^Li converges pointwise almost everywhere on X, as mentioned 
earlier in the section. By construction, g n = f n on X\E(t) for each n, and so 
{fn}^=i converges pointwise almost everywhere on X\E(t) for each t > 0. It 
follows that {fnj^Li converges pointwise almost everywhere on 

(85.17) [J(X\E(k))=X\(f)E(k) 

fe=i fe=i 

Of course, 



00 



(85.18) M (f|S(fc)) < Mfx(E(k)), 

fc=i 

and n{E(t)) < t^ 1 sup n>1 ||/ n ||i as t -> 00, by (|84.13|) . Hence 

00 

(85.19) /uf P| E(k)) = 0, 

fc=i 

which implies that {f n }^Li converges pointwise almost everywhere on X. 



86 Other measures 



Let (X, A, n) be a probability space, and let Bi C B 2 C • • • be an increasing 
sequence of er-subalgebras of A. Also let v be a real or complex measure on 
a cr-algebra B C A that contains each Bj. Suppose that the restriction of v 
to Bj is absolutely continuous with respect to the restriction of \i to Bj for 
each j. In particular, this happens when each Bj is associated to a partition of 
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X by finitely or countably many sets of positive /^-measure, as in Section [T7J 
Under these conditions, the Radon-Nikodym theorem implies that there is an 
fj £ L X (X, Bj) for each j > 1 such that 

(86.1) / f J d{i = v{B) 

JB 

for every B £ Bj. 

By construction, {fj}°^ 1 is a martingale on X with respect to the Bj's. 
Moreover, 

(86.2) / \fj\d»<\v\{X) 

J x 

for each j, where \ v\ denotes the total variation measure associated to v. As in 
Section[7Sl (186. 2[) basically corresponds to the statement that the total variation 
of the restriction of v to Bj is less than or equal to the restriction of \v\ to Bj. 
If v is absolutely continuous with respect to the restriction of /x to B, so that 
there is an / £ L X (X, B) such that 

(86.3) v{B) = f fdfx 

JB 

for every B £ B, then fj = E(f \ Bj) for each j. 
Put 

(86.4) d'^B) = fj,(AAB) + \v\(AAB) 

for every A, i? £ B. This defines a semimetric on B, as in Section [7H1 and the 
closure C of £ = UjLi w ith respect to d'(A, i?) is a cr-subalgebra of B that 
contains £ . More precisely, C is the smallest cr-subalgebra of B that contains £ 
and the sets A g £> such that /x(A) = |^|(A) = 0. In particular, C contains the 
smallest a- algebra B^ that contains £, and C is contained in the closure C of 
E with respect to d(A, B) = fi(A A B). 

Suppose that {fj}j^i converges to a function / E ^(X, B) in the L 1 norm. 
li A E Bi for some I, so that 

(86.5) f fjdfx= f f l d f x = v(A) 

J A J A 

when j > I, then 

(86.6) / fd f i= lim / fjdii = v{A). 
Ja J^^Ja 

Thus 

(86.7) / fdn = p(A) 

Ja 

for every A £ £, and hence for every A £ C , because both sides of the equation 
are continuous with respect to d'(A, B). This uses the analogue of uniform 
integrability for the single integrable function /. It follows that the restriction 
of v to Bex, C C' is absolutely continuous with respect to the restriction of /j, to 
Boo under these conditions. 
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87 Finitely-additive measures 



Let (X, A, fi) be a probability space, let B\ C B2 C ■ ■ ■ be an increasing sequence 
of er-subalgebras of A, and let {fj}JL\ be a martingale on X with respect to 
this filtration. If we put 



(87.1) 



I 3 dfx 



when A £ Bj, then v is well-defined on £ — \_}°° =1 Bj, because 



(87.2) 



fj 



fi dfi 



when A £ Bi and j > I. It is easy to see that v is finitely additive on £. 

Suppose that the //s have bounded L 1 norms, so that there is a C > with 
the property that ||/j||i < C for every j > 1. Let Ai, . . . ,A n be finitely many 
pairwise-disjoint subsets of X that are contained in £ . Thus A\,... ,A n G Bi 
for some and hence 

n n p n p 

(87.3) 5>(40l=£/ /^m < E/ i/'i^ 



k=l 



fc=l 



fe=l ' 



u: 



\fl\dfx<C. 



Conversely, if 
(87.4) 



EK4)|<c 



for every collection of finitely many pairwise disjoint elements A\, . . . , A„ of Bj, 
then || fj ||i < C. If v has an extension to a countably-additive real or complex 
measure on a cr-algebra that contains £, then (|87.4I) holds for each j, with C 
equal to the total variation of the extension of y on X. 

For example, let X be [0, 1) equipped with Lebesgue measure, and let Bj be 
the collection of subsets of [0,1) that are unions of dyadic intervals of length 
2 _J . In this case, £ is the algebra of subsets of [0, 1) that can be expressed as 
the union of finitely many dyadic intervals. Put 



(87.5) 



fj(x) = 
= 2 j 



when < x < 1 - 2 \ 
when 1 - 2~ J < x < 1. 



Thus 
(87.6) 



fj (x) dx = 



when I =[l 2-3, {I + 1) 2~ j ), < I < 2-? - 2, and 



(87.7) 



fj (x) dx = 1 
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when I = [1 — 2~ 3 , 1), which corresponds to I = V ' — 1. It is easy to see that 
{fj}JLi is a martingale on [0,1) with respect to this filtration. The finitcly- 
additive measure v on £ is characterized by v{I) = 1 when / is a dyadic interval 
with 1 as an endpoint, and v(I) = for every other dyadic interval I. Note 
that || fj ||i = 1 for each j, and that (|87.4[) holds with C = 1, as it should. If 
Ij = [1 - 2-^,1), then C 7^ and !/(/,-) = 1 for each j > 1, but f|°li ^ = 0- 
Basically, this martingale corresponds to a Dirac mass at the point 1. Since 1 is 
not included as an element of X = [0, 1), there is no countably-additive measure 
on X from which the martingale is obtained. 

Let {X, A, n) be any probability space again, with an increasing sequence 
Bi C B 2 C • • • of CT-subalgebras of A, and let {fj}° c L 1 be a martingale on 
X with respect to this filtration with bounded L 1 norms. As in Section 1551 
{fj}fLi converges pointwise almost everywhere on X. The limit determines an 
element g of C), where C is the closure of £ with respect to the usual 

scmimetric d(A, B) — fi(A A B) on A, as in Section [THl Equivalently, C is the 
smallest a-subalgebra of A that contains £ and every A £ A with n{A) = 0. If 
gj = E(g | Bj) for each j, then {gj}J^ l is a martingale on X with respect to this 
filtration that converges to g in the L 1 norm, as in Section [80l Hence {gj}f^i 
also converges to g pointwise almost everywhere on X, as in Section [85l If hj = 
fj — gj, then {hj}°^L 1 is also a martingale on X with respect to this filtration, 
and with bounded L 1 norms. By construction, {hj}j°^ 1 converges to pointwise 
almost everywhere on X. One can think of {g^f^i as the "regular part" of the 
martingale {fj}°^ 1 , and of {hj}JL 1 as the "singular part" of 



88 Maximal functions, 4 

Let (X, A, n) be a probability space, and let B\ C B% C • • ■ be an increasing 
sequence of er-subalgebras of A. If / € L 1 (X,^l), then /j = | Bj) defines 
a martingale on X with respect to this filtration, and we get the corresponding 
maximal function 

(88.1) f*(x)=sup\f j (x)\, 

as before. Note that / i-> /* is sublinear, in the sense that 

(88.2) (af)* = \a\r 
and 

(88.3) (f + g)*<f*+g* 

for every f,g& L l {X, A) and a G R or C. 

If / G L°°(X,A), then fj G L°°{X,A) for each j, and 

(88.4) \\fj\U < H/IU 

as in Section [751 This implies that /* G L°°(X,A), and that 

(88.5) ll/loo < ll/Hoo. 
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If / e ^{X.A), then 

(88.6) ll/illi<ll/ll 
for each j, as in Section [751 Put 

(88.7) E{t) = {x e X : f*(x) > t} 
for each t > 0, so that 

(88.8) f i(E(t)) ^ll/lk 

as in Section [5U 

Let g be the function defined on X by 

(88.9) g(x) = f(x) when |/(a=)| < i/2 

= when |/(ai)| > t/2. 

Thus g E L°°(X,A), and hence g* e L°°(X, A), with 

(88.10) ||5*||oo < Nloo < |- 
This implies that 

(88.11) f*{x) < (/ - g)*(x) + g*(x) < (/ - + 1 

for almost every a; € X, so that 

(88.12) (f-g)*(x)>t/2 

for almost every a; € E(t). 
It follows that 

(88.13) n(E(t)) < p({x eX:(f- g)*(x) > t/2}) < r 1 \\f - g^. 
Using the definition of g, we get that 

(88.14) f i(E(t)) < r 1 f \f(x)\dn(x). 

J{xeX:\f(x)\>t/2} 

If h is a nonnegative measurable function on X, then 

(88.15) A{h) = {{x, r)eXxK:0<r< h{x)} 

is a measurable subset of X x R. This is easy to see when h is a measurable 
simple function, and otherwise h can be approximated by an increasing sequence 
of measurable simple functions. Integrating pr p_1 over A(h) with respect to the 
product of [i on X and Lebesgue measure on R, we get that 

(88.16) f h p d f i= f prP- 1 n({xeX : h(x) > r}) dr. 
Jx Jo 
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More precisely, the left side of (|88.16p obtained by integrating pr v ~ x over A(h) 
in r and then x, while the right side is obtained by integrating in x and then r. 
In particular, if 1 < p < oo, then 



.17) / (/Td/i = J ptP- l fi(E(t))dt 

< I Pt p - 2 I \f(x)\dn(x)dt, 



x 



l{xeX:\f(x)\>t/2} 

by (|88.14p . Interchanging the order of integration, we get that 

(88.18) / (rydfi < / / \f(x)\ P tp- 2 dtdfi(x) 

Jx Jx Jo 



P2P- 1 

v-i 



\f{x)\* dn(x) 



x 



This shows that /* E D>(X,A) when / E U>(X,A) and p > 1. 
By constrast, if / € L P (X, A), then 

(88.19) tro^r < G/n*w, 

by (|78A8| . As before, 

(88.20) M ({z G : (|/PT(z) > *}) < / l/Wr d/i(x) 
for every t > 0. This implies that 

(88.21) M ({a; G X : (/* (a,))* > t}) < r 1 [ \f(x)\" dn(x), 

Jx 

or equivalently 

(88.22) ^{{x eX :f*(x)> t}) < r" [ \f(x)\ p d(j,(x) 

Jx 

for every t > 0. This is not strong enough to imply that /* € L p , by integrating 
over t as in the previous paragraph. However, it does have the advantage of 
working uniformly over p > 1. 

Note that we get the same estimates for the dyadic maximal function, as in 
Section [58j which corresponds to X = [0, 1) with Lebesgue measure, and where 
Bj consists of unions of dyadic intervals of length 2 _J . There are also similar 
estimates for the Hardy-Littlewood maximal function on the real line, as in 
Section |46l but with an extra factor of 2 in (|88.8I) . and in the later steps. 



89 Decreasing sequences of cr-algebras 

Let (X, A, jLt) be a probability space, and suppose that Ai 2 Ai 2 • • ■ is a 
decreasing sequence of c-subalgebras of A. As a basic scenario, it may be 
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that X = YiJLi Xj is the Cartesian product of a sequence of probability spaces 
Xi,X 2 , ■ ■ and that A n consists of subsets of X of the form rij=i x A where 
A is a measurable subset of njln+i Xj. In this case, conditional expectation 
with respect to A n corresponds to integrating a function on X in x\,...,x n . 
Basically, conditional expectation with respect to smaller cr-algebras corresponds 
to averaging functions over larger sets. 

Note that Aoo = (XjLi ^ s automatically a cr-subalgebra of A. If Aj e Aj 
satisfies Aj C Aj + i for each j, then Ujli A7 € .Aoo, because 

oo oo 

(89.1) U A i = U ^ e -4" 

for each n. Similarly, if Bj e A, satisfies -Bj+i C for each j, then 

oo oo 

(89.2) n b i = n ^ e ^ 

for each n, and so Hjli -^i e ^oo- If € -4j for each j, then it follows that 

CO OO CO oo 

(89.3) limsu P ^ = n(U^)' liminf^ = U(n^) 

are also elements of Aoo , by taking A/ = Ujl; an d ^ = H^L; ■ 

If / is a measurable function on X with respect to A, and if /j is a measurable 
function on X with respect to Aj such that / = fj almost everywhere for each 
j, then there is a measurable function on X with respect to Aoo such that 
/ = /oo almost everywhere. To see this, put 

(89.4) Ej = {xeX:fj(x) = f j+1 (x)}, 

so that Ej € Aj for each j. Thus i?; = f]JL t Ej e .4; , and /j (a;) = // (a;) for every 
x € Bi and j > I. By hypothesis, ji(X\Ej) = for each j, and so fi(X\Bi) = 
for each I, since X\_B; = \JJLi{X\Ej). We also have that Uti ^ e Ax>) as in 
the previous paragraph. Put 

CO 

(89.5) fooix) = when x e *\((J Bj) 

= //(a;) when x € _B; for some Z > 1. 

This is well defined, because fj(x) — fi(x) when .t e £?/ and j > £. Moreover, 
/oo is measurable with respect to Ai for every I, because fi is measurable with 
respect to Ai- This implies that /oo is measurable with respect to Aoo- It is 
easy to see that / = /oo almost everywhere, since f = fi almost everywhere. 

Let {fj}°^ 1 be a sequence of real- valued functions on X such that fj is 
measurable with respect to Aj for each j. Thus 

(89.6) sup/j^r), mifj(x) 

j>l 3> l 
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are measurable with respect to Ai for each I. This implies that 

(89.7) lim sup fj (x) , lim inf fj (x) 

are measurable with respect to Ai for each I, and hence are measurable with 
respect to Aoo- In particular, the set of x £ X on which {fj(x)} ( j^ 1 converges 
is measurable with respect to Aoo, and the limit defines a measurable function 
with respect to Aoo on this set. The analogous statement for complex- valued 
functions follows by considering the real and imaginary parts separately. 

Let / £ L X (X,A) be given, and put fj = E(f | Aj) for each j > 1, and 
fa = f. Thus 

n 

(89.8) / = £(/,■-!-/,•) + /». 

i=i 

for each n > 1. If / £ L 2 (X, A), then the functions /j_i — /j, 1 < j < n, and 
/„ are pairwise orthogonal in L 2 (X, A), as in Section [STJ This implies that 

n 

(89.9) ||/||» =x;ii/,-i-/iiii + ii/»ni 

for each n, and hence that J2JLi Wfj— l ~ /illi converges. Therefore 

oo 

(89.10) 

3=1 

converges in L 2 (X, ^4), by orthogonality, which implies that {fn}^L\ converges 
in L 2 (X, A). Of course, {f n }^ = i converges in L 2 (X, Ai) for each I, and the 
limits correspond to the same element of L 2 (X,A) for each Thus the limit 
may be represented by an element foo of L 2 (X, Aoo), by the earlier remarks. In 
particular, foo — £-(/oo | Aoo), which implies that 

(89.11) foo = E{f | Aoo)- 

This uses the fact that E(f \ Aoo) = E(fj \ Aoo) for each j, since fj — E(f \ Aj) 
and _4oo C Aj, and the convergence of {fj}°Z 1 to foo in L 2 (X,A). 

If / 6 L P (X,A), 1 < p < 2, then L 2 (X, A) is a dense linear subspace of 
L P (X, A), and one can use this to show that {fj}J^ 1 converges to E{f \ Aoo) in 
the IP norm. This also uses the fact that the conditional expectation operators 
have operator norm 1 on L p for each p. If / <E L° C (X, A), then fj € L°°(X, Aj) 
with 1 1 fj ■] \oo < H/lloo for each j. This together with convergence in L 2 (X,A) 
implies convergence in L P (X, A) for every p < oo. If / € L P (X 7 A), 2 < p < oo, 
then one can show again that {fj}JLi converges to E(f \ Aoo) in the L p norm, 
since this holds on the dense linear subspace L°°(X, A) of L P (X, A), and because 
the expectation operators are uniformly bounded on L p . 
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There are also maximal function estimates in this context. To see this, one 
can begin by observing that 

(89.12) f*{x) = max \fj(x)\ 

l<J<n 

is basically the same as before, because one can simply rearrange the indices to 
get an increasing sequence of n a-algebras. Hence the estimates for /* are the 
same as before, and the corresponding estimates for 

(89.13) /» = sup 1/^)1 

can be obtained by passing to the limit as n — > oo. Convergence almost every- 
where then follows from convergence in the L 1 norm, as in Section [55] 



90 Doubly-infinite sequences 

A probability space (X, A, fi) may also have a doubly-infinite sequence 

(90.1) •••C8_iCB C6 1 C... 

of er-subalgebras of A. In particular, this occurs very naturally in the context 
of doubly-infinite products. Let (Xj, Aj, fij), j £ Z be a family of probability 
spaces indexed by the integers, and let X = Jljl-oo ^ e their Cartesian 
product, equipped with the product measure /i. Thus X consists of the doubly- 
infinite sequences x — {xj}^_ QO such that Xj £ Xj for each j. If B n is the 
collection of subsets of X of the form A x Jl^Ln+i Xj, where A is a measurable 
subset of rij=-oo Xj, then B n is a er-subalgebra of the a-algebra of measurable 
subsets of X, and B n C B n+ i for each n. 

Suppose that [Xj , Aj , fij ) is a copy of the same probability space for each j. 
In this case, we can define the shift mapping T : X —> X by T(x) = y, where 
x = {aJj^-oo, V = {Vj}JL-oo e X satisfy 

(90.2) Vj = xj-i 

for each j. If A C X is measurable, then T(A) is also measurable, and 

(90.3) (i(T(A)) = n(A). 

Similarly, T maps B n onto B n +i for each n. 

If the X,-'s are compact Hausdorff topological spaces, then X is too, with 
respect to the product topology. If the Xj 's are all copies of the same topological 
space, then T is a homeomorphism. If the Xj's are all metrizable, then X is 
as well, as in Section 1691 However, this does not mean that there is a metric 
d(x, y) on X that determines the product topology and which is invariant under 
T in the sense that 

(90.4) d(T(x),T(y)) = d(x,y) 

for every x, y € X. If x, y € X satisfy Xj — xi for every j, I £ Z and x 3 = j/j for 
all but exactly one j £ Z, then T(x) — x and linin^oo T n (y) — x, which would 
not be possible if there were an invariant metric. 
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91 Submartingales 



Let (X, A, n) be a probability space, and let Bi C B2 C • • ■ be an increasing 
sequence of a-subalgebras of A. Also let {fj}JLi be a sequence of real-valued 
functions on X such that fj e ^{X.Bj) for each j. We say that {/j}^ is a 
submartingale on X with respect to this filtration if 

(91.1) fj < I'-'- fj • 1 I B,) 

almost everywhere on X with respect to fx for each j. Similarly, {fj}j^i is a 
supermartingale if 

(91-2) ./•.. > E(f j+1 I Bj) 

almost everywhere on X for each j. Thus {fj}J^i is a martingale if and only if it 
is both a submartingale and a supermartingale, and {fj}j^i is a supermartingale 
if and only if {— fj}JLi is a submartingale. 

If {gj}fLi is a real or complex martingale on X with respect the Bj's, then 
is a submartingale on X. If in addition gj e L p (X,Bj) for some p, 
1 < p < 00, and each j, then {|<7j| p }?li is a submartingale as well. More 
generally, if is a convex function on an interval / in the real line, which may 
be unbounded, and if gj takes valued in / and 4>°9j G L X (X, Bj) for each j, then 
{4 >0 9j}JL\ is a submartingale. These statements use the remarks in Section l78l 
The latter also works when {gj}JLi is a submartingale and <fr is both convex 
and monotone increasing on I. 

If {fj}j°=i is a submartingale on X and a is a nonnegative real number, then 
{afj}JL\ is a submartingale. If {fj}j^x, {gj}fLi are submartingales, then their 
sum {fj + gj}j°=i is a martingale too. Their maximum {max(/j ■, gj)}°^i is a 
submartingale as well, because 

(91.3) U < E(f j+1 I Bj) < £(max(/ j+bSj+1 ) | 6,) 
and 

(91.4) 9j < E(g 3+1 I Bj) < £(max(/ i+1) ffJ+1 ) | B,) 
imply that 

(91.5) max(/ i ,g i ) < £(max(/ J+ i, g i+ i) | Bj). 

Of course, {/j + ffjl^Li is a martingale when {fj}'?iL 1 , {gj}^! are martingales, 
but {max(/j, <7j)}j*Li is not normally a martingale in this case. 

Let {fj}fLi be a sequence of real-valued functions on X with /j <E L X (X, Bj) 
for each j, as before. Thus {fj}j^i is determined by the initial function f\ and 
the sequence of differences fj+i — fj. The condition that {fj}jLi be a martingale 
can be expressed by 

(91.6) E(f j+1 - fj I By) = 

for each j, while the condition that {fj}°^ 1 be a submartinagle is expressed by 

(91.7) E(f j+1 - fj I Bj) > 0. 
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Suppose that {fj}J^i is a submartingale, and put 

(91.8) <h = E(f j+1 -f j \B j )>0 

for each j. Also put A\ = a j when I > 2, and A\ = 0. Note that 

Ai e L}{X, Bi-i) when Z > 2, and A;(a;) is monotone increasing in I for each 
x £ X. By construction, {fi — Ai}^ 1 is a martingale, because 

(91.9) (f l+1 - A l+1 ) - (/, - A t ) = f l+1 -fi-ai 
and 

(91.10) E{fi +1 -fi-ai\ Bi) = E{fi +l - fi\Bi)-E{ai\Bi) 

= E(f l+1 - h | Bt) -a l= 0. 

Conversely, if {(t)j}JLi is anv sequence of real- valued functions on X such 
that 4>j € L X (X, Bj) and (f>j < (j>j+i for each j, then {4>j}JL l is a submartingale 
on X. If {^jl^Li is a martingale on X, then {</>j+V ; j}^i is also a submartingale. 
Every submartingale on X can be represented in this way, by the remarks in 
the previous paragraph. 

Suppose that fj — (f>j + ipj is a submartingale on X, where {^jl^Li is a 
martingale, and <f>j < 4>j+i for each j. If the integrals 

(91.11) / fjdfi 

Jx 

have an upper bound in R, then the integrals 

(91.12) / fadfj, 

Jx 

also have an upper bound in R, because J x ipj dfi is constant in j, by hypothesis. 
This implies that {4>j}°^i converges pointwise almost everywhere on X and in 
the L 1 norm, by the monotone convergence theorem. In particular, the 4>j 's have 
bounded L 1 norms. If the / 3 -'s have bounded L 1 norms, then it follows that the 
tpj's have bounded L 1 norms too. This implies that {ip^fLi converges pointwise 
almost everywhere on X, as in Section [551 an d hence that {fj}jLi converges 
pointwise almost everywhere on X as well. Similarly, {V'jl^Li converges in the 
L 1 norm when {fj}°^ 1 converges in the L 1 norm. Conversely, {fj}°^ 1 converges 
in the L 1 norm when {V^'I^Li converges in the L 1 norm and the integrals (|91.11[) 
have an upper bound in R. If {fj}°Z 1 is uniformly integrable, then {ipj}JLi is 
uniformly integrable, because {4>j}j° = i converges in L 1 and hence is uniformly 
integrable. This implies that {V'jlj^i converges in L 1 too, as in Section (53J so 
that {fj}jLi converges in L 1 as well, as in the case of martingales. 
Let be a submartingale on X, and observe that 

(91.13) f fjd^< f E(f j+1 | Bj) dfi = f f j+1 dfx 
Jx Jx Jx 
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for each j. If j > I, then 

(91.14) E{fi | B{) < E(E(f j+1 | Bj) \ B t ) = E(f j+1 \ B t ), 
and 

(91.15) f (E(f 3 | Bi) - fi) d l x= j fi dix- j fi d/*. 
Jx Jx Jx 

Suppose that J x fj d/i has an upper bound in R, and hence converges in R, by 
monotonicity. The monotone convergence theorem implies that {E(fj \ S/)}°^ ; 
converges in L 1 (X, Bi) for each /. It is easy to check that the limit gi satisfies 

(91.16) 9l = E(g l+1 | Bi) 
for each I, because 

(91.17) E(E(f 3 | B m ) | B t ) = E(f 3 \ B l ) 
for each j, I. Thus {gz}^ is a martingale, and 

(91.18) fx < E(f 3 | Bi) < 9l 
when j > /, by construction. Moreover, 

(91.19) / 9l dfi= lim / E(f j \Bi)d f i= lim / f s dfi 
Jx 3^°° Jx Jx 

for each I, which implies that 

(91.20) lim / [gi-fi)dn = Q, 

i->-°°Jx 

since J x gi dfi is constant in I. 

Conversely, if {gfij^i is a martingale on X such that f 3 < g' } - for each j, 
then 

(91.21) f f 3 df,< f g^dfi 

Jx Jx 

has an upper bound in R, because J x g' 3 dfj, is constant in j. In addition, 

(91.22) E(f 3 | Bi) < E(g' 3 \ Bt) = g[ 

when j > I, which implies that gi < g[ for each I, where gi is as in the preceding 
paragraph. 

Let {fj}j^i be a submartingale on X again, and put 

(91.23) f*(x)=max(J 1 (x),...,f n (x)). 

This is a bit different from the situation for martingales discussed in Section [541 
since we do not take the absolute values of the functions. However, if {gj}°^L 1 
is a martingale, then f 3 = \g 3 \ is a submartingale, and 

(91.24) f*(x)=ma X (\g 1 (x)\,...,\g n (x)\) 
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is the same as before. Note that /* is measurable with respect to B n , as before. 
Put 

(91.25) A n (t) = {x e X : f*(x) > t} 

for each n > 1 and teR, and Ao(t) = 0. Thus A n (t) € B n for each n > 1, and 
A n (i) C i4 n+ i(t). Observe that 

(91.26) 4(t)\4_i(t) = S X : /^(rc) < t, fi(x) > t} 
when I > 2, and that 

(91.27) Ai(i)V4o(t) - i4i(t) = {x e X : A(x) > i}. 
In particular, /; > t on A/(i)\A;_i (i), and so 

(91.28) i M (Ai(t)\Ai_i(t)) < / /|dA»- 

yA,(t)\A,_i(t) 

This implies that 

(91.29) < / E{J n \Bi)dn 

= / /n 

when 1 < I < n, because // < E(f n \ Bi), since {fj}jt\ is a submartingale, and 
Ai(t)\Ai-i(t) £ Bi. Of course, the sets Ai(t)\Ai_i(t), 1 < I < n, are pairwise 
disjoint, and their union is A n (t). Hence 

n n 

(91.30) tn(A n (t))=y2t t i(A l (t)\A l _ 1 (t)) < V / f n dn 

= fud/J, 
JA n {t) 

for each n > 1 and tgR. 

92 Another variant 

Let (X, .4, /x) be a probability space, let Si C B 2 C • • • be an increasing sequence 
of er-subalgebras of A, and let {fj}°^L 1 be a sequence of functions on X such 
that fj £ L l (X,Bj) for each j. As in the previous section, put 

(92.1) a j = E(f j+1 -f j \B j ) 

for each j, At = X^j=i a j when Z > 2, and Ai = 0. Thus Ai S i 1 (X, B;_i) when 
Z > 2, and {// — A{\f^ Y is a martingale, as before. If fj e L p (X,Bj) for some 
p > 1 and each j, and 

oo 

(92.2) Ell/j+i-Zillp 

i=i 
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converges, then {fj}j^i converges in the LP norm and pointwise almost every- 
where on X. Suppose instead that fj G L P (X, Bj) for each j, \\fj\\ p is bounded, 
and 

oo 

(92.3) 

converges. This implies that {^4;}^ converges in the LP norm and pointwise 
almost everywhere on X, and that ||/; — Ai\\ p is bounded. Because {// — A-i}fZi i s 
a martingale, it follows that {/; — Ai}f^ x converges pointwise almost everywhere 
on X, and in the LP norm when 1 < p < oo. 



93 Averaging functions 

Let (Xi,Ai,fJ,i), (X2, A2,H2), ■■ ■ be a sequence of probability spaces, and let 
X = YlJLi Xj be their Cartesian product, with the product measure [i. Also 
let B n be the cr-algcbra of measurable subsets of X of the form A x Jljln+i Xj, 
where A is a measurable subset of rij=i^j- Suppose that (f>j G L 2 (Xj,Aj) 

nil icfi 



satisfies 

(93.1) / 4>jdnj=Q 



x, 



and 

(93.2) (/ |^| 2 d Mj ) 1/2 <C 

for some C > and each j, and consider 

1 " 

(93.3) / n (s) = -I>te), 

£ = € X. Thus /„ G i 2 (X,S„) for each n, and 

(93-4) ||/„||i = ^]T H^lli <— , 

because of orthogonality. In particular, /„ — >• in L 2 (X) as n — > 00. 
Observe that 

^ ro+1 ^ n 

(93.5) f n+1 (x) -f n (x) = ——T^<t>j{xj)--^<t> 3 {xj) 

3 = 1 3=1 

_ <p n+1 (x n+1 ) _L ^ 

n+l „(„ + !) 2-, 

If a„ = E(f n+ i — f n I £>„), as in the previous section, then 

1 ™ 

(93.6) fl "W = - n(n+1) I]^(^)- 
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This is because 4>j{xj) is measurable with respect to B n when j < n, while the 
conditional expectation of 4> n +i(x n+ i) with respect to B n is equal to 0. Thus 
a n = -(l/(n + l))/ n , 

(93.7) Honlla < - 1 =^- Ty 

^n{n + 1) 

and so X^^Li ll a n||2 converges. It follows that converges pointwise 

almost everywhere on X , as in the previous section. 



94 Shift mappings 

Let (Xq, Ao, ho), be a probability space, and let X be the space of doubly- 
infinite sequences x = {xj}'*L_ 00 with Xj G Xq for each j. Thus X is the 
Cartesian product of a family of copies of Xq indexed by the integers, which is 
also a probability space with respect to the product measure /i. Let T be the 
shift mapping on X defined in Section [9"01 which preserves the measure /i. Also 
let / be an integrable function on X, and consider 

f94 n m + f(T(x)) + f(T 2 (x)) + ■■■ + /(T"(x)) 

If / is constant, then (|94.ip is the same constant for each n. Suppose instead 
that the integral of / is equal to 0. If / is square-integrable and depends only on 
one variable, then (|94.1|) converges to as n — > oo in the L 2 norm and pointwise 
almoste everywhere on X, as in the previous section. These are consequences of 
well-known ergodic theorems as well. One can also deal with other L p spaces, 
but let us focus here on p = 2 for simplicity. If / depends on only finitely many 
variables, then one can get the same conclusions from analogous arguments. 
More precisely, one can begin with averages like (|94.ip . but using powers of 
T r for sufficiently large r in place of powers of T. An average like (|94.1|) with 
arbitrary powers of T can then be estimated in terms of r smaller averages 
involving T jr+i , I = 0, . . . ,r — 1. After that, an arbitrary function / can be 
approximated by functions depending on only finitely many variables. There 
are also maximal function estimates for the averages (194.1[) like those that have 
been discussed in other contexts. 



95 Families of cr-subalgebras 

Let (X, A,fJ,) be a probability space, and let (I, -<) be a directed system. Thus 
X is a set, -< is a partial ordering on X, and for each a, b € X there is a c G X 
such that a, b ~< c. Suppose that for each a e X we have a cr-subalgebra B a of 
A, and that 

(95.1) B a C B b 

when a, b E X and a ~<b. If X is the set Z + of positive integers with the usual 
ordering, then this is the same as an increasing sequence of cr-subalgebras of A, 
as in Section l80l 
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Alternatively, let / be a nonempty set, and let (Xi, Ai, Hi) be a probability 
space for each i. Consider the Cartesian product X — Yl ieI Xi of the X^s, with 
the product measure /z. If X is the collection of nonempty finite subsets of /, 
then X is partially ordered by inclusion, and a directed system. More precisely, 
if a, b £ X, then a U b £ X, and a, b C a U b. Let B a be the collection of subsets of 
X that correspond to the Cartesian product of a measurable set A C Fliea 
an d riie/\a -^-t f° r eacn a ^ I. It is easy to see that B a is a cr-subalgebra of 
the er-algebra of measurable subsets of X, and that (|95.1I) holds. If the Xi's are 
compact Hausdorff topological spaces, so that X is also a compact Hausdorff 
space with respect to the product topology, then one may wish to use Borel 
sets. 

In this product situation, suppose that 0, € Lr(Xi,Ai) satisfies 

(95.2) / tt>idin=0 
for each i £ I. Put 

(95.3) $ (s) =^2^(xi) 

for each a E X, where a: = € X. Thus $ a € L X {X, B a ), and 

(95.4) S($ 6 | B a ) = $ Q 

when a,b £ X and a C 6. Hence $ a , a £ I, defines a martingale with respect to 
this family of cr-algebras. 

Martingales with more general indices like this are discussed in |152) . This 
point of view is very natural in connection with rearrangement of sums, and 
convergence of sums in the generalized sense. Note that the arguments for 
estimating maximal functions as in Section [84] do not work for partially-ordered 
sets of indices. The corresponding problems with pointwise convergence have 
already been seen at least implicitly in Section [HU in the case where / = Z+, 
Xi = {1,-1}, and /Xj({l}) = 1}) = 1/2 for each i £ I. However, if the 

cr-algebras B a are associated to partitions consisting of intervals in the real line, 
then one can use a covering argument as in Section l46l 

96 Stopping times 

Let (X, A, h) be a probability space, and let B\ C B2 C • • ■ be an increasing 
sequence of er-subalgebras of A. A function r : X — > Z + is said to be a stopping 
time if 

(96.1) T-^n) = {x £ X :t(x) =n} £ B n 
for each n > 1. This is equivalent to the condition that 

(96.2) t _1 ({1, . . . , n}) = {x £ X : t(x) < n} £ B n 
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for each n, since 

(96.3) r- 1 ({l,...,n}) = [Jr- 1 (0 

z=i 

and 

(96.4) r- 1 ^) = t-\{1, . . . , n})\r- 1 ({l, . . . ,n - 1}) 
when n > 2. Alternatively, r is a stopping time if 

(96.5) {x G A : r(x) > n} e B n 
for each n, because 

(96.6) {.xe A:r(x) > n} = X\t _1 ({1, . . . , n}). 
One can also allow r to take values in Z + U {+oo}, in which case 

oo 

(96.7) r-^+oo) - X\(|J r" 1 ^)) 

n=l 

is in the a-algebra 2?oo generated by U^Li ^n- 

If Ai , A 2 , . . . is a sequence of pairwise-disjoint subsets of A with A n <G B n for 
each n, then there is a unique stopping time t on X such that T _1 (n) = A n for 
each n. More precisely r(x) < +00 for every x G X if and only if U^Li ^« = X. 
Similarly, if E\ C £? 2 C • • • is an increasing sequence of subsets of X with 
E n e S„ for each n, then there is a unique stopping time r on X such that 

(96.8) {x € X : r(x) < n} = E n 

for each n. Of course, this corresponds to taking A\ — E\ and A n = E n \E n -\ 
when n > 2 in the previous statement. As before, t(x) < +00 for every x e A 
if and only if U^Li -En = A. 

If r, t' are stopping times on A, then max(T, r') and min(r, r') are stopping 
times too, because 

(96.9) {x G A : max(r(x), r'(x)) < n} = 

{x e A : r(x) < n} n {x G X : r'(x) < n} 

and 

(96.10) {x G A : min(r(x), r'(x)) < n} = 

{x e A : r(x) < n} U {x € A : r'(x) < n}. 

In particular, 

(96.11) t n (x) = min(r(x), N) 

is a stopping time on A when r is a stopping time and A is a positive integer. 
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Suppose that {f n }%Li is a martingale on X with respect to this filtration, 
and let 

(96.12) /*(*)= sup |/„(z)| 

n>l 

be the corresponding maximal function. Let t > be given, and remember that 
f*(x) > t if and only if |/„(x)| > t for some n. If f*(x) > t, then let t(x) be 
the smallest positive integer such that 

(96.13) \fr(x){x)\ > t, 

and put t(x) = +oo when f*(x) < t. Thus t(x) — n exactly when |/ n (x)| > t 
and |/i(a;)| < t for I < n. This implies that T _1 (n) e B„ for each n, because fi 
is measurable with respect to Bi C 2?„ when I < n. 

Let t be a stopping time on X such that t(x) < +oo for every x <G X, and 
let B T be the collection of subsets A of X such that 

(96.14) A n r _1 (n) G £>„ 

for each n. It is easy to see that this is a er-algebra, because B n is a a-algebra 
for each n, and that B T C S^. If A'' is a positive integer and r(x) < TV for each 
x £ X, then 

(96.15) £ T C Sjv. 

More precisely, if t is any finite stopping time, A £ B T , and t(x) < iV for every 
x £ A, then A <E Sat. If r' is another stopping time such that 

(96.16) t(x) < t'(x) < +oo 

for every then B T C B T /. 

Let {/ n }^i be a martingale on X with respect to this filtration, and let r 
be a finite stopping time on X . If f T is the function on X defined by 

(96.17) fr(x) = f r{x) (x), 

then f T is measurable with respect to B T , because /„ is measurable with respect 
to B n for each n. Let us check that 

(96.18) / \fr\dfi < f \f N \d/i 
Jt-hIi.-.JV}) Jx 

for each positive integer N. By the definition of f T , 

N 

(96.19) / |/ T |dM = Y, / l/n|d/i. 

A-i({l,...,iV}) „ =1 A-i(n) 

Hence 
(96.20) 



/ \fr\dfl < J2 [ \M d V 

Jt-H{1,...,N}) ^Jr-^n) 



= / \fw\dfi, 

Jt-H{1,...,N}) 
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because |/„| < E(\Jn\ \ B n ) when n < N. 

If t(x) < N for every x e X, then it follows that f T is integrable on X. Let 
us verify that 

(96.21) ' f T = E(f N | B T ), 

remembering that B T C 2?jv in this case. To see this, it suffices to show that 

(96.22) / f T dfi = f f N dn 



for every A e B T . Under these conditions, 

(96.23) f f T dfj, = V / f n dfi 

J A „ =1 JAnr-i(n) 

= / fNdfi= f N dfi, 

„ =1 JAnr-i(n) 

because A n r _1 (n) e B„ and /„ = -E(/jv | B n ) when n < N. 

Similarly, if the /„'s have bounded L 1 norms and r is any finite stopping 
time on X, then we get that 

(96.24) / |/ r |rf/x<sup / \f n \dfx 

Jt- 1 ({1,...,N}) n>lJX 

for every positive integer N. This implies that f T is integrable on X, and that 

(96.25) f |/ T |d/i<sup / |/ n |d/i. 

JX n>lJX 

In particular, this holds when there is an / G i x (^, -4) such that /„ = E(f \ B n ) 
for each n. In this case, one can check that 

(96.26) f T - E(f | B T ). 
As before, one can show that 



(96.27) / f T dfi= f 

J A J A 



fdfi 



when A & B T , by expressing A as the union of An r~ 1 (n), n > 1, and using the 
fact that f T = f n = E(f \ B n ) on A n r _1 (n) e S„. 

Now let r be a stopping time on X that takes values in Z + U {+oo}, so that 
tat = min(r, N) is a finite stopping time on X for each N. Let {/ n }^i be a 
martingale on X with respect to this filtration, and note that / Tjv is integrable 
on X for each N, since tjv is bounded. Let us check that 

(96.28) f TN = E(f TN+1 I B N ) 
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for each N, so that {/t n }n=i is a martingale as well. As usual, we would like 
to show that 

(96.29) / f TN dfi= / f TN+1 dn 

J A J A 

when A € Bm- Consider 

(96.30) A 1 = {x e A : t{x) < N} 
and 

(96.31) A 2 = {x e A : r > N}. 
Thus A 1 (JA 2 = A, Ai n A 2 = 0, and 



(96.32) A u A 2 eB 



since r is a stopping time. If x G A\, then tn{x) — tm+i(x) = t(x), and hence 
fr N (x) = fr N+1 {x) = f T (x). This implies that 



(96.33) / f TN dn = / f TN+1 dfi. 

J A x Ja ± 

If x e A 2 , then tn(x) = N, tn + i(x) = N + 1, and so f TN (x) = /at (a;), 
fr N+1 {x) = f N +i(x). It follows that 

(96.34) / / rw d^= f N d^= f N+ i d^= f TN+1 dp, 

J A 2 J A 2 J A 2 J A 2 

because A 2 e Bm and /jv = E(Jn+i \ Bn), as desired. 

If t(x) < oo for every x £ X, then {/t n }jv = i converges to f T pointwise on 
X, because f TN (x) — f T (x) when N > r(x). If the /„'s have bounded L 1 norms, 
then the / rjv 's also have bounded L 1 norms, and f T is integrable. A necessary 
and sufficient condition for {/ rjv }?/ = i to converge to f T in the L 1 norm is that 



(96 



.35) f \U N (x)\ dti{x) = f \f N (x)\ dfi{x) 

J {xeX:r(x)>N} J {xeX:r(x)>N} 



as N — > oo. This holds automatically when the /„'s are uniformly integrable, 
and otherwise depends on both the /„'s and r. 

97 Ultrametrics 

A metric d(x, y) on a set M is said to be an ultra-metric if 
(97.1) d(x, z) < max(d(x, y), d(y, z)) 

for every x, y, z € M. If Ai, A2, ... is a sequence of nonempty sets, and r\,r 2l . . . 
is a decreasing sequence of positive real numbers that converges to 0, then one 
can define an ultrametric on the Cartesian product X = Jljli ^3 as follows. 
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Each element x of X is a sequence {Xj}j°. 1 with Xj <E Xj for every j, and we 
put d{x,x) — 0, and 

(97.2) d{x,y)= n 

when x y and I is the smallest positive integer such that x\ ^ yi- It is easy 
to see that this is an ultrametric on X, and that the corresponding topology is 
the product topology associated to the discrete topology on Xj for each j. 
If d(x, y) is an ultrametric on a set M, p,q £ M, and t > r > 0, then either 

(97.3) B(p,r) C B(q,t) or B(jp, r) (1 B(q, t) = 0. 

More precisely, the first alternative holds when d(p, q) < t, and the second 
alternative holds when d(p, q) > t. Using this, one can check that open balls are 
closed subsets of ultrametric spaces. There is an analogous dichotomy for closed 
balls, which implies that closed balls are open subsets of ultrametric spaces. It 
follows that ultrametric spaces are totally disconnected, in the sense that they 
do not contain connected subsets with more than one element. 
Another consequence of the previous dichotomy is that 

(97.4) B(p,r) = B(q,r) 

when d(p, q) < r. Thus every element of an open ball in M can be used as 
a center of that ball. The collection of open balls in M with the same radius 
r forms a partition of M, because any two such balls are either the same or 
disjoint as subsets of M. If t > r, then the partition of M into open balls of 
radius r is a refinement of the partition of M into open balls of radius t, since 
every ball of radius r is contained in a ball of radius t. 

The geometry of an ultrametric space is very similar to a probability space 
with an increasing sequence of c-subalgebras of the er-algebra of measurable sets. 
In particular, one can consider c-subalgebras of the Borel sets in an ultrametric 
space corresponding to partitions by balls of a given radius. One can also 
deal directly with Hardy-Littlewood type maximal functions, using the nesting 
properties of balls to reduce of covering of a set by balls of bounded radius to 
a disjoint union of balls that are maximal elements of the covering. Of course, 
there are more complicated covering arguments for Euclidean spaces and other 
metric spaces, including the basic property of intervals in the real line mentioned 
in Section 1461 These can also be used to estimate maximal functions, and so 
on. 
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Part IV 

Vector- valued functions 



98 Some randomized sums 



Let [X, A, fi) be a probability space, and let <f>i,. 
complex- valued measurable functions on X, with 



be bounded real or 



(98.1) 



< c 



for some C > and j = 1, . . . , n. Also let {1, —1}™ be the set of sequences 
e = {ej}™ =1 of length n with ej = 1 or — 1 for each j. If 2 < p < oo, then there 
is a positive real number C(p) such that 



/n p n 

a, </>,(*) d/ 1 ( a! )<C(p)(53|o 

< ■- I i ■ — J I - f 3=1 3=1 



P/2 



for all ai, . . . , a n G R or C, as appropriate. Of course, the left side is the same 
as 



(98.3) 

As in Section loTl 



/ 2 ~" E E^-fcO*) 



ee{l,-l}" 3=1 



dji(x) 



(98.4) 2-" £ ^^(as) < C'(p) ( £ M^tf 

eG{l,-l}« 3=1 3=1 



p/2 



for some C"(p) > and all a\,...,a n <E R or C and x <E X. This implies 
(|98.2I) . by integrating in x and using the uniform boundedness of the ^-'s. More 
precisely, C(p) depends only on C and p, and not on ai,...,a„ or n. 
If p = 2, then we have that 



(98.5) 
This implies that 



2 n E ^2 £ s a iM x ) =E' a i^( a; )i 2 

e€{l,-l} n 3=1 3=1 



(98.6) 



^ E / E 



d Mz) = E N 

3=1 



when 11^112 = 1 for each j. Otherwise, if 



0j\\2 



> c for some c > and each j, 



then we get that 

„ n 2 n 

(98.7) 2-" £ / 5>a^ d^)>c 2 ^ 

ri 1 l « « -A ^ i „- i 



ee{i,-ir 



3=1 



3=1 
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Note that 
(98.8) 



E 

^{1,-1} 



X 



3=1 



dfi(x] 



i/p 



is monotone increasing in p, by Jensen's inequality. This is the same as the LP 
norm of Yjj=i e j a o ^jC 2 -) as a function of (x, e) on X x {1, —1}", with respect 
to the product of /j on X and 2~™ times counting measure on {1,-1}™. 
Under these conditions, if < p < 2, then there is a C(p) > such that 



(98.9) C(p)- 1 (^|a,| 2 ) P/2 <2 



i=i 



E 

ee{l,-l}» 



i=i 



d(j,(x) 



for all Oi, . . • , a n G R or C. This can be derived from the previous estimates 
and Holder's inequality, as in Section [(H] More precisely, Holder's inequality 
can be used to estimate the L 2 norm of Xw=i e i a i "Aj'C^O on X x {1, — l} n in 
terms of its LP and L 4 norms, as before. Under the present conditions, the L 2 

norm is bounded from below by a constant multiple of f ■_ 1 |aj| 2 J , and 

the L norm is bounded from above by a multiple of the same expression, which 
leads to a lower bound for the LP norm as in (|98.9|) . As usual, the constant 
C(p) in (|98.9I) depends on c, C, and p, and not on a±, . . . , a n or n. 



99 Randomized sums, 2 

Let (X, A, /i) be a probability space again, and let (f>\ , . . . , 4> n be orthonormal 
functions in L 2 (X). As usual, this implies that 



(99.1) 

for all a±, 
(99.2) 



„ n 2 n 

/ E a o fa ( x ) d ^ = E \ a o i 1 

JX 3=1 3=1 

,a„ 6 R or C, as appropriate. Hence 

„ n 2 n 

/ E e j a 3 & ^ d v( x ) = E 



for every e G {1, — l} n . In particular, the average of the left side of (199. 2[) over 
e G {1, —1}" has the same value, as in (|98.6|) . 

Suppose that <fii , (j>2 , • • ■ is an orthonormal basis for L 2 (X), and that the </y's 
are uniformly bounded on X, as in the previous section. Thus every function 
in i 2 (A) can be approximated in the L 2 norm by a finite sum of the form 



(99.3) 



E a i^'( x )' 

3=1 
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Moreover, the average of the L p norms of 

n 

(99.4) X>«i&(aO 

3=1 

over e G {1, —1}™ is bounded by a constant multiple of the L? norm for every 
p < oo, as before. However, this does not mean that the L p norm of (199. 4[) is 
bounded by a multiple of the L 2 norm for every e € {1,-1}, or even for only e = 
(1, . . . , 1). If we start with a function in L 2 (X) which is not in L P (X) for some 
p > 2, then the LP norms of its approximations are necessarily unbounded. Note 
that Fourier series and Walsh functions are examples of this type of situation. 
Lacunary series and Rademacher functions correspond to subsets of these bases 
for which the LP norms are bounded by constant multiples of the L 2 norms 
when 2 < p < oo. 



100 The unit square 

Let X = [0, 1) x [0, 1) be the version of the unit square associated to dyadic 
intervals, equipped with 2-dimensional Lebesgue measure. If /, L C [0, 1) are 
dyadic intervals with the same length 2~- 7 , then their Cartesian product / x L is 
a dyadic square in X with side length 2~- 7 and area 2~ 2 - 7 . There are 2 2j dyadic 
squares in X with side length 2~ J , they are pairwise disjoint, and their union 
is equal to X. Let Aj be the collection of subsets of X which can be expressed 
as unions of dyadic squares with side length 2~ J , including the empty set. This 
is the same as the er-algebra of subsets of X generated by the partition Vj of 
X into dyadic squares of side length 2 _J , as in Section [77J Note that Aj is a 
er-subalgebra of the a algebra of Borel subsets of X, and that Aj C Aj+% for 
each j. As usual, a function on X is measurable with respect to Aj if and only 
if it is constant on dyadic squares with side length 2~ J . 
Let fj(x,y) be the function on X defined by 

(100.1) f j (x,y)=2 j 

when x, y are contained in the same dyadic interval of length 2~- J , and 

(100.2) /,•(*,») = 

when x, y are contained in distinct dyadic intervals of length 2~ 3 . In particular, 

(100.3) f fj(x,y)dxdy = 2- j 

Jlxl 

for each dyadic interval / of length 2~ J . Summing over /, we get that 

(100.4) f fj(x,y)dxdy = l 

J[0,l)x[0,l) 
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for each j, because there are 2 J dyadic intervals of length 2 Clearly fj(x,y) 
is measurable with respect to Aj for each j . It is easy to see that 

(100.5) f 3 =E{f J+1 | Aj) 

for each j, so that {fj}j is a martingale with respect to the Aj's. 
Let v be the Borel measure on X defined by 

(100.6) u(A) = \{x £ [0, 1) : (x, x) £ A}\, 

where \E\ denotes the Lebesgue measure of E C [0, 1). Alternatively, if 

(100.7) A = {(x,x) :x£ [0,1)} 
is the diagonal in X, then 

(100.8) z/(A) = \n(A n A)|, 

where 7r(a;,x) = a; is the natural projection of A onto [0,1). Of course, the 
restriction of v to Aj is absolutely continuous with respect to the restriction of 
2-dimensional Lebesgue measure to Aj for each j. One can also think of fj as 
the conditional expectation of v with respect to Aj , as in Section [551 

If x,y £ [0, 1) and x ^ y, then fj(x,y) = for all sufficiently large j. In 
particular, {fj(x,y)}j converges to almost everywhere on X. Basically, {fj}j 
converges to v in a suitable weak sense. 

Now let Bj be the collection of subsets of X that can be expressed as the 
union of sets of the form IxA(I), where / runs through the dyadic subintervals of 
[0, 1) of length 2~ J , and A(I) is a Borel set in [0, 1) for each such /. Equivalently, 
A £ Bj if for each dyadic interval / C [0, 1) with |/| = 2~ J there is a Borel set 
A(I) C [0, 1) such that 

(100.9) An (I x [0, 1)) = I x A(I). 

Thus Bj is a c-subalgebra of the u-algebra of Borel sets in X, Aj C Bj, and 
Bj C Bj+x for each j. A function f(x,y) on X is measurable with respect to 
Bj if and only if it is constant in x on each dyadic interval / of length 2~ J and 
Borel measurable in y. 

In particular, fj(x, y) is measurable with respect to Bj for each j. One can 
also check that 

(100.10) E(f j+1 | Bj) = fj 

for each j, so that {fj}j is a martingale with respect to the Bj's as well. The 
main point is that 

(100.11) f fj(x,y)dxdy=\AnI\ 

for each dyadic interval / of length 2~i and Borel set A C [0, 1). Similarly, if 
Ji, J2 are the dyadic intervals of length 2~i~ 1 such that / = I\ U I2, then 

(100.12) / f j+x (x,y)dxdy 

JlxA 
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= / fj+i(x,y)dxdy + / f j+1 (x,y)dxdy 

= |An/i| + |Anl 2 | = |^n/|. 

This implies (jlOO.lOj) . which can also be seen by viewing fj as the conditional 
expectation of v with respect to Bj , by (|100.11[) . 

Let $j be the function on [0, 1) with values in i 1 ([0, 1)) defined by 

(100.13) ^(x)(y) = f^y). 

This may be considered as a martingale on [0, 1) with values in i 1 ([0, 1)), with 
respect to the usual filtration associated to dyadic intervals of length 2~ J . Note 
that the L 1 norm of Qj(x) is equal to 1 for each x and j, but {&j(x)}j does 
not converge in L 1 ([0, 1)) for any x G [0, 1). If we identify integrable functions 
on [0, 1) with absolutely continuous Borel measures on [0, 1], which determine 
bounded linear functionals on the space of continuous functions on [0, 1] with 
respect to the supremum norm, then {^^(a:)}^ converges in the weak* topology 
to the Dirac mass at x. 



101 Partitions and products 

Let (Xi,Ai,fii), (^2,^2,^2) be probability spaces, and let X = X\ x X 2 be 
their Cartesian product, with the product probability measure (J, = fXi x /i 2 - 
Suppose that V\, Vi are partitions of X\, Xi into finitely or countably many 
measurable sets, respectively, as in Section [77] The corresponding product 
partition "P 12 of X consists of all products Ai x A 2l with A\ G V\ and A 2 € TV 
It is easy to see that this is a partition of X into finitely or countably many 
measurable sets, and that the tr-algebra generated by V\p is the same as the 
one associated to the cr-algebras generated by Vi, Vi in the product space. A 
function f(xx, X2) on X is measurable with respect to this er-algebra if and only 
if it is constant on A\ x A2 for each A\ € V\ and A2 G TV 

Now let V\ be a partition of X\ into finitely or countably many measurable 
sets, and let B2 be a er-subalgebra of ^4 2 - This leads to a cr-subalgebra Bi : 2 of 
the er-algebra of measurable subsets of X associated to the cr-algebra generated 
by V\ and B2 in the product space. As in the special case described in the 
previous section, S 12 consists of the sets A C X such that for each A-y e V\ 
there is an A 2 G B2 such that 

(101.1) Ar\(A 1 xX 2 )=A 1 xA 2 . 

Equivalently, A G B li2 if A can be expressed as a union of sets of the form 
A\ x i 2 , where A\ runs through the elements of V\, and A2 G B2 for each 
Ai G V\. Thus a function /(xx,^) on X is measurable with respect to B\^ if 
it is constant in X\ on each A\ G V%, and measurable in X2 with respect to Z? 2 
for each x\ G X\. 
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As in Section [771 it will be convenient to ask that fii(A\) > for each 
A\ £?i. If B2 = A2 and / is an integrable function on X, then the conditional 
expectation of / with respect to B1.2 is given by 

(101.2) E {f\B 1 , 2 ){x 1 ,x 2 ) = -±—{ f(t,x 2 )d»i(t) 

when xi £ A\ G V\. This can be seen as a combination of the conditional 
expectations associated to partitions and product spaces, as in Sections 1751 and 
[771 If B2 is any cr-subalgebra of A 2 , then E(f | #1,2) can be obtained by first 
averaging f(x\,X2) over x\ G A\ for each A\ £ Vi, as before, and then taking 
the conditional expectation of the resulting functions of X2 with respect to B2 ■ 
In this case, 61,2 is a cr-subalgebra of the er-algebra associated to V\ and A2. 



102 Partitions and vectors 

Let (X, A, /1) be a probability space, and let V be a partition of X into finitely or 
countably many measurable sets, as in Section [77l As usual, it will be convenient 
to ask that fj,(A) > for each A e V. Also let B(V) be the cr-subalgebra of A 
generated by V, consisting of unions of elements of V, including the empty set. 
Thus a function on X is measurable with respect to B(V) if and only if it is 
constant on the elements of V '. 

Let V be a real or complex vector space with a norm \\v\\, and let f(x) be a V- 
valued function on X that is constant on the elements of V . In particular, ||/(x)|| 
is a nonnegative real-valued function on X that is constant on the elements of 
V . If f(A) denotes the value of / on A e V, then 

(102.1) f \\f{x)\\ d M (aO = Y, \\M)\\^A). 

More precisely, if V is a partition of X into finitely many sets, then the sum 
on the right is a finite sum, and ||/(x)|| is automatically integrable on X. If V 
consists of infinitely many measurable subsets of X, then the sum on the right 
is interpreted as the supremum of the corresponding sums over finite subsets of 
which may be infinite. 

If V has only finitely many elements, then we can put 

(102.2) / f(x)d»(x)= Y,f(A)»(A). 

This also makes sense when V has infinitely many elements, ||/(x)|| is integrable 
on X, and V is complete. In this case, the sum on the right side of (|102.ip is 
finite, and the sum on the right side of ()102.2[) converges in the generalized 
sense, as in Section [15] In both cases, 



(102.3) 



f(x) dfi(x] 



x 



< / \\f(x)\\d^x) 



X 
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Similarly, if B £ B(V), then we would like to put 
(102.4) f f(x)dp(x)= J2 f( A )KA). 

J B AeT > 

AC.B 

As before, this makes sense when B is the union of finitely many elements of V, 
and when B contains infinitely many elements of V, ||/(^)|| is integrable, and 
V is complete. We also have the analogue of (|102.3I) with X = B. 

Using the Bochncr integral, one can integrate much more complicated vector- 
valued functions. We shall restrict our attention here to sums over partitions 
for the sake of simplicity. 



103 Vector-valued martingales 

Let (X,A,fi) be a probability space, and suppose that V\,Vi,... is a sequence 
of partitions of X into finitely or countably many measurable subsets such that 
'Pj+i is a refinement of Vj for each j. This means that each B £ Vj is the union 
of the A £ Vj+i such that A C B. If Bj = B(Vj) is the er-algebra generated by 
Vj, then it follows that Bj C Bj+i for each j. As usual, it is convenient to ask 
that fj,(A) > for each AeV-j. 

Let V be a real or complex vector space with a norm ||i>||, and let // is a 
V- valued function on X that is constant on elements of Vi- We would like to 
define the conditional expectation of // with respect to Bj for j < I by 

(103.1) E(.n I Bj){x) = J- [ f l d»=Y i MA) (l{ '' 



Aev l 

ACB 



when x £ B £ Vj, where fi(A) denotes the value of // on A £ Vi, as in the 
previous section. This makes sense when each B £ Vj is the union of finitely 
many A £ Vi, and when ||//|| is integrable and V is complete. In both cases, it 
is easy to see that 

(103.2) \\E(fi \Bj)\\ <-E(||/i|| \Bj). 
If j < k < I, then one can also check that 

(103.3) E(E(fi | B k ) | Bj) = E(fi | Bj), 

under these conditions, just as in the context of real or complex- valued functions. 

Now let {fj}°^ 1 be a sequence of V- valued functions on X such that fj is 
constant on the elements of Vj for each j. As usual, {fj}°^ 1 is said to be a 
martingale with respect to this filtration if 



(103.4) = E(fi | B 



j > 



for each j < I, More precisely, this makes sense when each element of Vj is the 
union of finitely many elements of Vi , and when each 1 1 /; | is integrable and V 
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is complete. Note that (|103.4j) holds for all j < I when it holds for I = j + 1. 
because of (|103.3|) . 

Of course, the simplest type of situation occurs when Vj consists of only 
finitely many measurable subsets of X for each j. All of the sums involved in 
the conditional expectations are then finite sums, and the functions ||/z|| are 
automatically bounded. 

104 L 1 - Valued martingales 

Let us continue with the same notations and hypotheses as in the previous 
section. As in Section 11001 we can get an example of a V- valued martingale on 
X with V = L 1 (X, A) by taking 

(104.1) MA) = fi(A)- 1 1 A 

for each A £ Vi- Here 1a denotes the indicator function associated to A on X, 
equal to 1 on A and on X\A, as usual. Thus ||/i(x)||i = 1 for every x G X 
and I > 1, and it is easy to check that (|103.4[) holds. 

Now let (Y, B, v) be a c-finite measure space, and let us consider functions 
on X with values in V = ^(Y). If fi(x) is an L (Y)-valued function on X that 
is constant on the elements of Vi , then 

(104.2) Ffay) =fi(x)(y) 

defines a function on X x Y that is constant in x on each element of Vi and 
measurable in y for each If ||/i(a;)||L 1 (y) is integrable on A, then Fi(x, y) 

is integrable on X x Y, and 

(104.3) j ||/,(aO||Li ( y)d/i(a:) = j (J \F l (x,v)\d»(yj)dii.{x) 

= / \Fi{x,y)\d{nxv)(x,y). 

JXxY 

Conversely, if Fi(x, y) is an integrable function on X x Y that is constant in x 
on each element of Vi, then we get an i 1 (F)-valued function fi(x) on X that is 
constant on each element of Vi and for which ||/z(x)||ii(y) is integrable on X. 

Let Bi be the cr- algebra of subsets oflxF that corresponds to Bi = B(Vi) 
on X and B on Y in the product space. As in Section flO 11 a set A C X x Y is 
in Bi if and only if for each A 6 Vi there is a B £ B such that 

(104.4) An(AxY) = Ax B. 

Equivalently, A € 6; if it can be expressed as the union of sets of the form 
A x B(A), where A runs through the elements of Vi, and B(A) e £> for each 
A e P;. In the context of the preceding paragraph, the functions Fi(x,y) are 
measurable with respect to B\. 
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Suppose that {fi}fZi is a sequence of i 1 (F)-valued functions on X such that 
fi(x) is constant on each element of Vi and ||/;(a;)||i,i(K) is integrable on X for 
each I. This corresponds exactly to a sequence {Fi}f^ 1 of integrable functions 
onlxY such that Fi (x, y) is measurable with respect to Bi for each I, as in the 
previous paragraphs. If Y is a probability space, then X x V is also a probability 
space, and it is easy to see that {fi}^ 1 is an L 1 (F)-valued martingale on X 
with respect to the Bis if and only if {Fi}^ 1 is a martingale on X x Y with 
respect to the Bis. This basically works as well when Y is cr-finite, by extending 
the relevant definitions in a natural way. 



105 Pointwise convergence 

Let us continue with the same notation and hypotheses as in Section [1031 with 
the additional condition that V be complete. Suppose that {fj}°^ 1 is a sequence 
of V- valued functions on X such that fj(x) is constant on each element of Vj, 
\\fj{x)\\ is integrable on X for each j, and {fj}°Z 1 is a martingale with respect 
to B =B(r 3 ). If 

(105.1) f*{x) = max \\fj(x)\\ 

l<j<n 

is the usual maximal function and 

(105.2) A n (t) = {xeX: f*(x) > t} 
for each t > 0, then 

(105.3) tfi(A n (t)) < [ \\f n (x)\\ dn{x) 



JX 

for every t > and n > 1. This can be shown in the standard way. In particular, 
one can use the fact that is a submartingale, because of (|103.2|) . 

Suppose now that ||/n(^)|| has uniformly bounded L 1 norm, and put 

(105.4) f*(x)= S up\\f j (x)\\. 
If 

(105.5) A(t)={xeX;f*(x)>t} 
for each t > 0, then 

oo 

(105.6) A(t) = |J A n (t), 

n=l 

and of course A n (t) C A„_|_i(f). It follows that 



(105.7) tfi(A(t)) < sup \\f n (x)\\dfi(x) 

n>l JX 

for each t > 0, by taking the limit asn^oo in (|105.3[) . 
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As in Section [531 we can also consider {fj — /i}£Lj as a V- valued martingale 
on X with respect to the B/s with j > I. II 

(105.8) B,(t) = |x e X : sup - /,(a;)|| > t|, 
then we get that 

(105.9) tfj,(Bi(t)) < sup f \\f 3 (x) - dfx(x) 

j>l Jx 

for each t > and Z > 1. Hence 

oo - 

(105.10) tJ P| B,(t)) < lim sup / \\fj(x) - /,(as)|| d/z(x) 

' l ^°°j>lJx 

for each i > 0. 
Suppose that 

(105.11) lim sup / H/^a:) - / z (x)|| ^(x) = 0, 

i^oo j>; j x 

which means that {/j}K=i is a Cauchy sequence with respect to the L 1 norm 
for V- valued functions on X. This together with (|105. 10|) implies that 

oo 

(105.12) p(f|B,(t)) =0 
for every i > 0. Of course, 

oo 

(105.13) X\(f) £,(*)) = 

{a; € X : sup ||/j(x) - /j(x)|| < t for some Z £ Z + }, 
and it follows that 

(105.14) lim su.p||/j-(a;) — /i(a?)|| =0 

l-toa j>i 

for almost every x £ X, by taking t — l/n for n € Z+. This shows that 
{/ J '(x)}^ 1 is a Cauchy sequence in V for almost every x £ X, and hence that 
{fj{x)}'j^ 1 converges for almost every x £ X, because V is complete. Thus this 
criterion for convergence almost everywhere works as well in the vector- valued 
case as for real or complex-valued functions. 

106 Another scenario 

Let (X\, A%, /tii), (X2, A2, ■ ■ ■ be a sequence of probability spaces, and let 
X = rii=i be their Cartesian product, with the product measure (i. As 
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usual, let B n be the er-subalgebra of the er-algebra of measurable subsets of X 
of the form 

oo 

(106.1) Ax Yl x h 

where A is a measurable subset of YYj=i x j- If each Xj has only finitely or 
countably many elements, and every subset of Xj is measurable, then B n consists 
of the sets of the form (|106.1[) . where A is any subset of YYj=\ x j- In this case, 
B n is the c-algebra generated by the partition V n of subsets of X of the form 
(|106.1[) . where A C YVj=i x j nas exactly one element. 

Let ai(xi), 02(^2), .. . be a sequence of integrable real or complex- valued 
functions on Ai, A2, ■ ■ ■ such that 

(106.2) / a^Xj) dnixj) = 

JXj 

for each j. Also let V be a real or complex vector space with a norm , and let 
vi,V2, ■ ■ ■ be a sequence of elements of V. Under these conditions, it is natural 
to consider 

n 

(106.3) fn(x) = a 3 {xj)vj 

3=1 

as a V- valued martingale on X with respect to the B^s. In this case, it is 
very easy to understand the meaning of the vector-valued integrals, because of 
the special form of the functions. This is also consistent with the discussion in 
Section [1031 when the Xj's have only finitely or countably many elements, and 
all of their subsets are measurable. 

By construction, f n (x) takes values in a linear subspace of V with dimension 
less than or equal to n for each n € Z + . Thus one can identify /„ with a function 
on X with values in R" or C" whose components are measurable. One can also 
check that H/ri^H is measurable as a nonnegative real- valued function on X, 
using the fact that any norm on R™ or C™ is bounded by a constant multiple 
of the standard norm, and hence is continuous with respect to the standard 
topology. Moreover, {||/n(a;)||}^Li is a submartingale with respect to the 2? ra 's, 
basically because the norm of the integral of a V- valued function is less than or 
equal to the integral of the norm of the function. 

Suppose that ||/ n (^)|| has uniformly bounded L 1 norm, and let 

(106.4) f*(x) =sup||/„(a:)|| 

n>l 

be the corresponding maximal function. As in the previous section, 

(106.5) t»({x e X : f*(x) > t}) < sup f \\f n \\ dfi 

n>l Jx 

for every t > 0. This permits one to show that 

(106.6) lim sup || -/„(*) || =0 

l>n 
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for almost every x £ X when 

(106.7) lim sup / \\f l -f n \\dfx = 0, 

as before. Hence {fn( x )}^Li is a Cauchy sequence in V for almost every x in 
X under these conditions. If V is complete, then it follows that {f n (x)}^Li 
converges for almost every x £ X. 

107 Hilbert space martingales 

Let (X, A, jit) be a probability space, and suppose that B\ C £? 2 Q • • • is an 
increasing sequence of a-subalgebra of A as in Section 11031 or the preceding 
section. Also let (V, (v, w)) be a real or complex Hilbert space, and let {fj}J^i 
be a ^-valued martingale with respect to the Bj'a such that £ L 2 (X) 

for each j. 

As in Section [811 one can check that 

(107.1) / (f J (x)J l+1 (x)-f l (x))d f i(x)=0 
Jx 

for each j < I. If j < I, then we get that 

(107.2) f (f j+1 (x) - fj(x),f l+1 (x) - fi(x)) dn{x) = 0. 
Jx 

Using the identity /„ = /i + Y^jZi (/i+i — fj)> ^ follows that 

(107.3) f \\f n (x)\\ 2 dn(x) = 
Jx 

„ n—X „ 

/ \\h(x)\\ 2 dn(x) + Y, / \\f J+ i(x) - f^f df,(x) 
Jx j=1 Jx 

for each n. Similarly, 

„ n— l „ 

(107.4) / ||/ n ( a: )-/ I ( a :)|| 2 d^) = E / ll/;+i(z)-/;(*)H 2d Mz) 



when n > I. 

If ||/ n (x)|| has bounded L 2 norm, then (|107.3|) implies that 

oo - 

(107.5) V / \\f j+ i(x) - /,(x)|| 2 d/i(a;) < oo. 
Under these conditions, 

oo „ 

(107.6) lim ]T / ||/ J+1 (s) - fjWfdpix) = 0, 

I— tOQ * 4 I v 

3=1 JX 
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and hence 

(107.7) lim sup / \\.f n (x) - hixW d^{x) = 0. 

l-HX> n> l J X 

In particular, {fj(x)}°° =1 converges in V for almost every x € X, as in the 
previous sections. 



108 Nonnegative submartingales 

Let (X, A, /i) be a probability space, and let B\ C B% C • • ■ be an increasing 
sequence of er-subalgebras of A. Also let {fj}jLi be a submartingale with respect 
to this filtration such that fj > for each j. This includes the case of the norm 
of a vector- valued martingale, as before. If 

(108.1) t n {x) = max fj(x) 

!<3<n 

and 

(108.2) A n {t)={x£X:f*(x)>t}, 
then 

(108.3) tn(A n (t))< [ f n dfi< [ f n dn 

JA n {t) JX 

for each t > and n > 1, as shown previously. If 

(108.4) /*(!)= sup ^(i) 

and 

(108.5) A(t) = {i6Jf:/'(i)>t}, 
then 

oo 

(108.6) A(t) = |J A n (i) 

n=l 

and 

(108.7) *M-A(t)) < SU P / fndfi 

n>l J X 

for each t > when the L 1 norms of the /„'s are bounded. 
By hypothesis, 

(108.8) Q<fj< E(fn | Bj) 

when j < n, and of course E(f n \ Bj) is a martingale in j for each n. If 
/„ <E L P (X), 1 < p < oo, then 



(108.9) / (max £(/„ | Bj)Y d^J, < [ f%d^ 
Jx v i<j<« / p - 1 Jx 

as in Section [55] Hence 

(108.10) / (/^rfp^ / y*d/*. 
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If the L p norm of /„ is uniformly bounded in n, then the monotone convergence 
theorem implies that /* € L p , with 

(108.11) / (/*)Pd/i<^- B up / ftdp. 

JX P~ 1 n>lJx 

Thus one gets the same L p estimates for nonnegative submartingales as for 
martingales. 

109 L p - Valued martingales 

As in Section 11041 we can look at L p -valued martingales in terms of functions 
on a product space. Let (X, A, /i) be a probability space, and let V21 ■ ■ ■ be a 
sequence of partitions of X into finitely or countably many measurable subsets 
with positive measure such that Vj+i is a refinement of Vj for each j. Also let 
(Y, B, v) be a cr-finite measure space, and fix p, 1 < p < 00. 

If fi(x) is an L p (Y)-valued function on X that is constant on the elements 
of Vi , then 

(109.1) Fi(x,y) = fi(x)(y) 

is a function on XxY that is constant in x on each element of Vi and measurable 
in y for each x € X. If € L P (X), then F;(x,?/) £ L P (A x Y), and 

(109.2) jjmWl^d^x) = J (J lFfayWMvj)*^*) 

= f \Fi(x,y)\ p diMXv)(x,y). 

J XxY 

Conversely, if Fi(x, y) € L P (X x F) is constant in x on each element of Vi, then 
we get an L p (y)-valued function /;(x) on X that is constant on each element 
of Vi and for which ||/i(x)||x,p(y) € £ P (A). If is the a-algebra of subsets of 
XxY that corresponds to Bi = B(Vi) on X and £> on Y as before, then Fi(x, y) 
is measurable with respect to Bi . 

Now let {/z}^ be a sequence of L p (F)-valued functions on X such that 
fl{x) is constant on each element of Vi and ||/i( a; )llLp(y) € L P (X) for each Z. 
This corresponds exactly to a sequence of functions {i 7 ;}^ in L P (X x Y") such 
that Fi{x,y) is measurable with respect to for each I, as in the preceding 
paragraph. If {fi}fZi is an L p (Y)-valued martingale on X with respect to the 
Bis and Y is a probability space, then X x Y is also a probability space, and 
{Fi}'j*^ 1 is a martingale on X x Y with respect to the Bis. If the L P (X) norm 
of ||/i( :z: )ll£ p (i') is bounded, then the L P (X x Y) norm of Fi(x,y) is bounded, 
and hence {-Fi}^ converges in L P (X x Y). In particular, {Fi}^ is a Cauchy 
sequence in L P (X x Y), which implies that 

(109.3) lim sup f ||/,(x) - fi(x)\\ p LP{Y) dfi(x) = 0. 
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Of course, the same conclusion holds when < v{Y) < oo, by dividing 
by v{Y) to get a probability space. Otherwise, let p be a strictly positive 
measurable function on Y such that 



(109.4) J p(y)du(y) = l, 

which is possible because (Y, B, v) is supposed to be cr-fmite. Thus 

(109.5) Vp {B)= f p{y)dv{y) 

JB 

is a probability measure on (Y, B). If <f>{y) G L P (Y, v), then 

(109.6) 4> p {y) = cj)(y) piy)- 1 '? e L P (F, u p ), 
and 

(109.7) f \4> p (y)f dv p {y) = J \<f>(v)\P dv(y). 

Using this, one can check that (|109.3|) holds for any er-finite measure space 
(Y, B, v), by reducing to the probability space (Y, B, v p ). 

110 Another criterion 

Let (X, A, p) be a probability space, and let B\ C B% C • • ■ be an increasing 
sequence of cr-subalgebras of A as in Section 11031 or 11061 Also let V be a real 
or complex Banach space with a norm ||u||, and let {fj}j^\ be a U-valued 
martingale on X with respect to the Bj's such that ||/j(x)|| G ^(X) for each j. 
Suppose that for each e > there is a U-valued martingale {gj}°Z 1 on X with 
respect to the Bj's such that 

(110.1) f \\f J {x)-g J {x)\\dp{x)<e 

for each j, and {gj{x)}°^ 1 converges in V for almost every x G X. Let us check 
that {fj{x)}°^ 1 converges in V for almost every x <E X under these conditions. 
Of course, it suffices to show that 



(110.2) lim sup \\f 3 :(x)-fi(x) || =0 

' ""' 3>l 



I— »oo 



for almost every x € X, so that {/i(^)}^i is a Cauchy sequence in V for almost 
every x € X. Put fej = /j — gj, so that is also a V- valued martingale 

on X with respect to the Bj's. Observe that 

(110.3) lim sup || fj (x) - ft(x) \\ < lim sup \\gj(x) - 9i(x)\\ 

+ lim sup || hj (x) — hi(x)\\ 
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for every x € X. This implies that 

(110.4) lim sup ||/j(x) — fi(x)\\ < lim sup ||/ij(a:) — 

for almost every x € X, because {gj(x)}°°^ 1 is a Cauchy sequence in V for 
almost every x £ X. Hence 

(110.5) Km sup||/j(x) - fi(x)\\ < 2 sup ||/tj(ar)|| =2h*(x) 

for almost every x G X. 

By the usual maximal function estimate, 

(110.6) tfi({x E X : h*(x) > t}) < sup / \\hj(x)\\ dfi(x) < e 

j>i Jx 

for every t > 0. If 

(110.7) E(t) = {xeX: lim sup - > 2t\, 

[_ j>i J 

then 

(110.8) KE(t)) < n({x € X : h*(x) > t}), 
by (I110.5P . and so 

(110.9) tfi{E(t))<e 

for every e, t > 0. Because i?(£) does not depend on e, we may conclude that 
n(E(t)) — for every t > 0. This implies that (|110.2j) holds for almost every 
x £ X, as desired. 

Note that this criterion is satisfied when 

(110.10) lim sup / Wfrix) - fi(x)\\ dfi(x) - 0. 

l^oc j>i J x 

To see this, one can take gj{x) to be of the form f m in(j.N){ x ) f° r large positive 
integers N. This converges as j — > oo for each fixed N trivially, and (|110.1[) 
holds for sufficiently large N by hypothesis. 

Ill i 1 - Valued martingales 

As before, let (X, A, n) be a probability space, and let Vi : V2, ■ ■ ■ be a sequence of 
partitions of X into finitely or countably many measurable subsets with positive 
measure such that Vj+i is a refinement of Vj for each j. Suppose that {fj}°^L 1 
is a sequence of functions on X with values in i 1 = £ 1 (Z + ). Thus for each 
x E X and j > 1 we get a summable sequence {fj,k( x )}kLi of real or complex 
numbers. Of course, fj(x) is constant on each element of Vj if and only if fj t k(%) 
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is constant on each element of Vj for each k > 1. If is intcgrable on 

X, then fj,k{x) is intcgrable on X for each k, and 

\\f j {x)\\ t idn{x)= £|/;, fe (z)MM0r)=E / 

Jx k=l k=l Jx 

Suppose now that {/j}^! is an ^-valued martingale on X with respect to 
Bj = B(Pj). This implies that {/j,fe}?S=i is a martingale on X with respect to 
the Bj's for each k. In particular, 

(111.2) / \f jik (x)\ dii{x) < [ \f M {x)\dii{x) 
Jx Jx 

for each j, k > 1 , and hence 

(111.3) f \\fj(x)yd^x)< f \\f j+1 (x)\\ el d»(x) 
Jx Jx 

for each j. 

Suppose also that the L X (X) norm of ||/j(a;)||^i is bounded. Because of 
monotonicity, 



(111.4) sup / \\f j (x)\\ t idn(x)= lim / WfjWWtidnix), 
j>iJx i^^Jx 

and similarly 

(111.5) sup / \f jtk (x)\dfj,(x) = lim / |/j,fc(a;)| 
j>iJx 

for each fc. The monotone convergence theorem for sums implies that 

(111.6) lim / ||/ j (a;)|| / id/i(a : ) = y;(.Iim / |/ jlfc (a:)| dfi{x)) . 
Therefore 

(111.7) sup / ||/ j (a;)|| / id/i(a : ) = E;(sup/ \f jlk (x)\dn{xj). 
j>iJx i- lJx 

Let iV be a large positive integer, and put 

(111.8) 9j.k( x ) = fj,k{x), hj,k{x) = when k < N, 

(111.9) 9j,k( x ) = °> hj,k{x) = fj,k{x) when k > N. 

If ^(a;) = {5j : /c(^)}^Li, M x ) = {hj,k(x)}kLi> then g j (x),h J (x) e f 1 and 

(111.10) fi{x)=g j (x) + h j (x). 

Note that {<7j(x)}j^i converges for almost every x € X, as a consequence of the 
convergence almost everywhere of real or complex martingales with bounded L 1 
norm. One can also check that ||/ij(a;)||^i has small L X (X) norm, uniformly in j, 
and for sufficiently large N, by the discussion in the preceding paragraph. Thus 
{fj}JLi satisfies the criterion described in the previous section, and it follows 
that {fj(x)}°^ 1 converges in I 1 for almost every x e X. 
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112 Differentiability of paths 



Let (V, ||i>||) be a real or complex Banach space, and let / : [a, b] — > V be a path 
of finite length. Suppose that for each e > there is a path g : [a, b] — > V of 
finite length such that the length of / — g on [a, b] is less than or equal to e and 
g is differentiable almost everywhere on [a,b]. We would like to show that / is 
also differentiable almost everywhere on [a, b]. 

If a < x < b and r > 0, then let S r (f)(x) be the set of difference quotients 

(112.1) 

x-y 

where a < y < b and < \x — y\ < r. One can check that / is differentiable at 
x if and only if 

(112.2) lim diam<5 r (/)(iE) = 0, 

r— >0 

using the completeness of V for the "if" part. Put h = f — g, and observe that 

(112.3) diam£ r (/)(x) < di&mS r (g)(x) + diam 5 r (h) (x) 

for every x € [a, 6] and r > 0. 
By hypothesis, 

(112.4) lim diam5 r ( 3 )(a:) = 

r— >o 

for almost every a; £ [a, 6], Hence 

(112.5) lim 6 r (f)(x) < sup diamd r (h)(x) 

for almost every x £ [a,b]. 

Using maximal functions as in Section [50l we get that for each t > there 
is an open set E t {h) C R such that 

(112.6) ||%)-%) || <t\\x-y\ 

when E t (h) does not contain the interval connecting x,y € [a, 6], and 

(112.7) l-Et(MI < 2e< _1 . 

Here |£^(/i)| denotes the Lebesgue measure of E t (h), as usual. Thus 

(112.8) sup 5 r (h)(x) < 2t 

for every x £ [a, b]\E t (h). It follows that 

(112.9) lim <5 r (/)(x) < 2t 

I 7-0 

for almost every x G [a, b]\E t (h). Using these estimates for every e,t > 0, we 
get that (|112.2p holds for almost every x € [a, b], as desired. 



149 



113 Paths in i 1 

Let / : [a, b] -> I 1 = ^( z +) be a path of finite length. Thus f(x) = {/,•(» 
where each /j is a real or complex- valued function on [a, 6] of bounded variation. 
More precisely, let I be a positive integer, and let Vi, . ■ . , V\ be partitions of [a, b], 
as in Section|4T] Also let V be a partition of [a, 6] that is a common refinement of 
V%, . . . , Vi- If A^ (/, T 3 ) denotes the approximation to the length of / associated 
to V, and similarly for the f/s and V/s, then 

i i 

(113.1) 5>*(/,,^) < E < ^aU,n 

3=1 3=1 

Hence 

(H3.2) EA^(/„^)<A^(/), 

3=1 

where A^(/) denotes the length of / on [a, &]. This implies that 

i 

(H3.3) EA^(/,)<A(/). 

3=1 

because Vi,...,Vi are arbitrary partitions of [a, &]. Therefore 

oo 

(H3.4) 5>^-)<A(/), 

3=1 

because I > 1 is arbitrary. 

Similarly, if V is any partition of [a, &], then 

oo oo 

(113.5) A b a (f, V)=J2 A «(/3^3) < E A '(/3)- 

3=1 3=1 

This implies that 

oo 

(H3.6) A^/j^A^). 

3 = 1 

It follows that 

oo 

(H3.7) A^(/) = £ A «l/3-)> 

3=1 

by the remarks in the preceding paragraph. 
Let N be a large positive integer, and put 

(113.8) gj(x) = fj(x), hj(x) =0 when j < AT, 

gj(x) = 0, hj(x) = fj(x) when j > AT. 
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If g{x) — {gj(%)}jLi, h(x) = {hj(x)}^. 1 , then g{x),h{x) £ I 1 for every x in 
[a, b], and 

(113.9) f(x)=g(x) + h(x). 

Observe that /, g : [a, b] — > i 1 have finite length, and that the length of h on [a, b] 
tends to as N — > oo. We also know that g is differentiable almost everywhere 
on [a, b], by the corresponding results for real or complex- valued functions. It 
follows that / is also differentiable almost everywhere on [a, b] , as in the previous 
section. 



114 LP- Valued functions 

Let (Y,B,v) be a cr-finite measure space, and consider RxF, equipped with 
the product measure corresponding to Lebesgue measure on the real line. A 
function F(x,y) £ L P (R x Y), 1 < p < oo, may be considered as representing 
an LP function on R with values in L P (Y). Put 

(114.1) F p (x) = (| \F(x,y)fdu(y)y /P , 



(114.2) (/ \F{ Xl y)\Pdxdv(y)) 1/P = ( / F p {xf dx) 



which is the L P (Y) norm of F(x, y) in y. By Fubini's theorem, 

I = ( / r 

Thus the L P (R x Y) norm of F(x,y) is the same as starting with the L P (Y) 
norm of F(x,y) in y, and then taking the L P (R) norm of the result in x. 
Put 

(114.3) L(F)(a:)=limsupi-jf + ( J \F{t,y) - F(x,y)\ p ^(y)) ^ dt. 

As in Section |47l we would like to say that 

(114.4) L(F)(x)=0 

for almost every x £ R. As usual, there are two main ingredients in the proof. 
The first is that this condition holds for a dense class of functions F(x, y) in 
L p (x,y). In this case, one can use finite linear combinations of functions of 
the form f(x)g(y), where f(x) £ i p (R) and g(y) £ L P (Y). If one also takes 
f(x) to be continuous, then the limit is equal to at every x £ R. If Y is 
a locally compact Hausdorff topological space and v is a Borel measure on Y 
with suitable regularity properties, then one can use continuous functions on 
R x Y with compact support as the dense class. Again the limit is equal to 
for every ieRin this situation. The second main ingredient is an estimate for 
an appropriate maximal function, which reduces here to the Hardy-Littlewood 
maximal function of F p (x) £ L P (R). 
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115 Continuous L p - valued functions 



Let (Y, B, v) be a measure space, and let / be a continuous function on the real 
line with values in L P (Y), 1 < p < oo. If g g L P (F), then 



is cr-finite. Applying this to /(r) for each rational number r, we get that there 
is a cr-finite measurable set Y"o C Y such that /(r) = on Y\Yo for every re Q. 
This implies that f(r) = almost everywhere on Y\Yq for every r G R, because 
/ : R — > L P (Y) is continuous. Thus we may as well suppose that Y is it- finite. 

Let us now restrict our attention to the case where / has compact support 
on R. More precisely, let / = [a, b] be a closed interval in the real line such that 
f(x) = when x G R\[a, b}. Consider the product R x L P (Y) with the product 
measure associated to Lebesgue measure on R, as in the preceding section. We 
would like to check that there is an F(x, y) G L P (R x Y) such that 



In particular, the I p (Rx Y) of F(x, y) would be bounded by a constant multiple 
of the supremum norm of ||/(a;)||ip(y) on/. If F(x, y), F(x, y) € L p (RxY) both 
satisfy (|115.3[) for almost every x 6 R, then it follows that F(x, y) = F{x, y) 
for almost every (x, y) 6 R x Y . 

If f(x) = 4>{x) g for some real or complex-valued function 4>{x) with compact 
support on R and some g G L P (Y), then we can simply take F{x, y) = 4>{x) g{y). 
Similarly, if f{x) is a finite linear combination of L p (K)-valued functions on R 
of this form, then it is easy to get F(x, y). Otherwise, one can approximate f(x) 
by a sequence {fj(x)}°^ 1 of L p (y)-valued functions of this type with respect to 
the supremum norm of ||/(:e)||lj>(Y") on /. By construction, fj(x) corresponds to 
a function Fj (x, y) in L P (R x Y) for each j. Moreover, {Fj (x, y)}J^± is a Cauchy 
sequence in L P (R x Y), because of (|115.4p . Hence {Fj(x,y)}°°^ 1 converges to a 
function F(x, y) in L P (R x Y). It is not too difficult to verify that this function 
F(x,y) satisfies ([115. 3p . as desired. 

116 Lipschitz L p -valued functions 

Let (Y,B,v) be a cr-finite measure space, and suppose that / : R — > L P (Y) is 
a Lipschitz mapping for some 1 < p < oo. It will be convenient to ask also 





(115.4) 
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(116.2) (f \F{x + h,y)- F{x,h)\ p dxdis(y)) 1/P <C\h\. 



at first that f(x) have compact support in R, which is to say that there is a 
closed interval [a, b] in the real line such that /(x) = when x £ R\[a, 6]. Let 
F(x, y) be the function in L p (Rx Y) that corresponds to /(x) as in the previous 
section. Because /(x) has compact support, the ordinary Lipschitz condition 
implies an integrated Lipschitz condition of the form 

(116.1) ( J ||/(x + h) - f(x)\\l P(Y) dx) 1/P < C \h\ 

for some C > and every h £ R. This implies that 

F(x + /i,y)-F(x,/i)|*'dxdj/(?;)' 1 ' 

'RxV J 

Let g be the exponent conjugate to p, so that 1/p+l/q =1. If /i g R, /i ^ 0, 
and $(x, y) e L'(R x 7), then put 

(116.3) A h ($)=/ E^+hllpi F ±2A^ y)dxdv{y) . 

JRxY " 

This defines a bounded linear functional on L 9 (R x Y), with dual norm less 
than or equal to C, by Holder's inequality. We also have that 

(116.4) \ h ($) = -[ F(x,y) ^ y) ~^ X ~ Ky) dxdv(y), 

using the change oe variables x x — h. If 

(116.5) $(x,y) = 4>(x)i>(y), 

where (f>(x) is a continuously-differentiable real or complex-valued function on 
the real line with compact support and ip(y) £ L q (Y), then we get that 

(116.6) limA h (*) = -/ F{x,y)4>'{x)iP(y)dxdv{y). 

JRxY 

Similarly, 

(116.7) lim A h ($) 

h— >-0 

exists when $(x,?/) is a finite linear combination of functions of this form. As 
in Section [5H it follows that (|116.7|) exists for all 3>(x,y) <E L q (R x Y), since 
it exists for a dense linear subspace of L q (R x Y), and since the dual norms of 
\h, h £ R\{0}, are bounded. 

Thus (|116.7jl defines a bounded linear functional on L q (R x Y) . By the Riesz 
representation theorem, there is a function G(x,y) in L P (R x Y) such that 

(116.8) IimA fc (*)=/ G(x,y)Q(x,y)dxdv(y) 
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for every &(x, y) £ L q (H x Y). In particular, 



(116.9) / F(x,y)<j>'(x)4>(y)dxdv{y) = 

JTLxY 

- / G(x,y)(j){x)ip(y)dxdv{y) 

JTLxY 

when 4>(x) is a real or complex-valued continuously-differentiable function on 
the real line with compact support and ip(y) £ L q (Y). Put 

(116.10) U(x) = J f(x)(y) r/,{y) dv{y) 

for each x £ R. More precisely, f(x) £ L P (Y) for every x £ R, and f^,{x) is the 
integral of the product of this function with ip £ L q (Y) over Y. Thus f^(x) is 
a Lipschitz function on R with compact support for every ip £ L q (Y), because 
/ : R — > L P (Y) is a Lipschitz mapping with compact support. Using (I116.9[) . 
we get that 

(116.11) / U{x)<j>'(x)dx = - f G(x,y)<j>(x)ip(y)dxdv(y) 

JR JTLxY 

for every <f)(x), tp{y) as before. This implies that 

(116.12) /;(x) = G(x, y) i>{y) dv{y) 

for every t/j £ L q (Y) in the sense of distributions, as in Section [SSI Hence 

(116.13) U(t) - U(r) = J J G(x, y) dv{y) dx 
for every r, t £ R with r < t and ip £ L q (Y). It follows that 

(116.14) f( t )-f(r)= f t G(x,-)dx 

J r 

when r < t, where both sides of the equation are elements of L P (Y). 

Now that we have this expression for differences of the values of /, one can 
use the analogue of Lebesgue's theorem in this context to conclude that / is 
differentiable almost everywhere as an L p (Y)-valued function on the real line. 
This works as well for Lipschitz mappings from the real line into L P (Y) that 
may not have compact support, since the problem is local. This also works 
for paths of finite length in L P (Y), 1 < p < oo, because of the approximation 
arguments in Sections [50] and 1511 
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117 More duality 

Let (Y, B, v) be a cr-finite measure space, and let / be a continuous function 
from the real line into L P (Y), 1 < p < oo. Suppose also that / has compact 
support in R, and let q be the exponent conjugate to p, so that l/p+ l/q = 1. 
We would like to define a bounded linear functional on L q (R x Y) directly by 

(117.1) A($) = J (J f(x)(y) *{x, y) dv{yj) dx. 
Because of Holder's inequality, 

(117.2) J^f{x)(y)^(x lV )d V {y) < \\f(x)\\ LP{Y) ( J dv(yj) 
and 

1/9 



1/9 



1/9 



(117.3) jf 11/(^)11^) (jf |#(a:,i/)|»di/(|/)) 'dx 

< (/ \\.f(x)\\ P LP{Y) dx) 1/P ( [ \*{x,y)\«dxdv{yj) 
However, one should be a bit careful about the measurability of 

(117.4) J^f{x)(y)$(x,y)dv{y) 

as a function of x. If <3>(x, y) = </>(x) f/H?/) f° r some 4>(x) G L 9 (R), i/>(y) € £ 9 (Y), 
then this reduces to 

(117.5) <j>{x) J^f{x){y)i>{y)dv{y). 
The continuity of / : R — > L P (Y) implies that 

(117.6) J Y f(x){y)i>(y)dv{y) 

is continuous in x, and so there is no problem in this case. Because linear 
combinations of functions of this type are dense in in L q (R x Y), one can use 
this to extend A($) to all $ e L q (R x Y). Similarly, 

(H7.7) x h m = J r (J y + -/(*)(*) * (a , y) My) ) dx 

can be defined more directly as a bounded linear functional on L q (R x Y) for 
each /i e R\{0}. Equivalently, 

(H7.8) x h{ $) = -J r (J y mw *(*>y)-*f- h >v) My) ) dx . 
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If &(x, y) — 4>{x) ip(y), where now <j)(x) is a continuously-diffcrcntiablc function 
on R with compact support and tpiu) & L q (Y), then it follows that 

(117.9) lim\ h ($) = -J <j/(x)(J f(x)(y)ip(y)dv(y))dx. 

At this point, one can continue as in the preceding section when / : R — > L P (Y) 
is Lipschitz. 

118 ^-Valued functions 

Of course, the arguments in the previous sections can be simplified when the 
functions take values in £ p — £ P (Z + ), and there are some commonalities with 
p = 1. Suppose that f(x) = {fj(x)}'^L 1 is a Lipschitz function on the real line 
with values in £ p , 1 < p < oo. In particular, fj(x) is a Lipschitz function on R 
for each j, and hence is differentiable almost everywhere. Using the Lipschitz 
condition for / : R — > £ p , one can check that {fj{x)}j^ 1 £ £ p for every x e R 
such that f'j{x) exists for each j, with £ p norm bounded by the Lipschitz constant 
for /. We also have that 

(118.1) f j (t)-f j (r)=[f j (x)dx 

J r 

for every r,ieR with r < t. If p < oo, then one can use this to show that / 
is differentiable almost everywhere on R as a mapping into £ p ', with derivative 
given by {f'j{x)}J^ l . As usual, it is convenient to restrict one's attention initially 
to functions / with compact support, so that G L P (TL). As in the 

p = 1 case, one can approximate / by functions with only finitely many nonzero 
components, for which differentiability almost everywhere is already known. 
One can then use maximal function estimates to show that the errors are small 
most of the time. 

Note that a Lipschitz mapping from the real line into a separable Hilbert 
space is differentiable almost everywhere, by the p = 2 case. This can be 
extended to paths of finite length in a separable Hilbert space, because of the 
approximation arguments in Sections [50] and [5TJ As in Section |45j any path 
of finite length is continuous at all but finitely or countably many elements of 
its domain, and hence is contained in a separable subspace of the range. This 
implies that a path of finite length in any Hilbert space is differentiable almost 
everywhere, because it is contained in a separable Hilbert subspace. 

119 Products and cr-subalgebras 

Let (X\,Ai, Hi), (A 2 ,.42,^2) be probability spaces, and let X = X\ x X 2 be 
their Cartesian product, with the product measure fj, = p,\ X jj.2. Also let B\, 
Bi be cr-subalgebras of A\, Ai, respectively, and let B be the corresponding 
er-subalgebra of the er-algebra of measurable subsets of A. If 4>i(xi) £ L x (Xi), 
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4>2{x 2 ) € L {X 2 ), then (j)(xi, x 2 ) — (j>i(xi) (j) 2 (x 2 ) G -^ 1 (^)i an d we would like 
to check that 

(119.1) E x {cp I B) = Ex^fa I B 1 )E X2 (4> 2 | B 2 ), 

where the subscripts of i? are included to indicate the spaces on which the 
conditional expectations are taken. Both sides of the equation are measurable 
with respect to B, and so it suffices to verify that 

(119.2) / E x (4>\B)dn= [ E Xl (4>i I Bi)Ex 2 {(t>2 | B 2 )d\x 
jb Jb 

for every B E B. This reduces to 

(119.3) / 4>dn= [ E Xl (4> 1 \B 1 )Ex 2 {^2\B 2 )d^ 

J B J B 

by the definition of the conditional expectation. If B = B\ x B 2 with B\ E B\ , 
B 2 E B 2 , then both sides of this equation are equal to 

(119.4) ( / 0id/xi)( I ct>2dfx 2 ), 

J B\ J B2 

using the definition of the conditional expectation again. This implies that the 
previous equation holds when B is the union of finitely many pairwise-disjoint 
sets of the form B\ x B 2 , with B\ £ B\ and B 2 E B 2 . The analogous statement 
for any B 6 B follows by approximation. If B\ or B 2 is generated by a partition 
of X\ or X 2 into finitely or countably many measurable sets, then every B E B 
can be expressed as the union of finitely or countably many disjoint sets of 
the form B\ x B 2 , with B\ E B\ and B 2 E B 2 , as in Section 11011 and the 
approximation is much simpler. 

120 cr-Subalgebras and vectors 

Let (X, A, n) be a probability space, and let B be a cr-subalgebra of A. Also 
let V be a finite-dimensional real or complex vector space with a norm, which 
can be identified with R ra or C™ for some n using a basis. Thus a T^-valued 
function f(x) on X corresponds an n-tuple (/i(x), . . . , f n (x)) of real or complex- 
valued functions on X. Such a function is considered to be integrable when its 
components fi(x), . . . , f n (x) are integrable, in which case the integral is defined 
by integrating the components separately. Similarly, the conditional expectation 
of a V- valued function / on X may be defined by applying the conditional 
expectation to the components of /. 

Let A be a linear functional on V, so that X(v) can be expressed by a linear 
combination of the components of v. If f(x) is an integrable V- valued function 
on X, then X(f(x)) is an integrable real or complex-valued function on X, and 

(120.1) A( J f(x) dfi(x)) = J X(f(x)) dfi(x). 
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If 1 1 is a norm on V, and ||A||* is the corresponding dual norm on V*, then it 
follows that 



(120.2) 



a(| f(x)dn(x) 



< / \X(f(x))\d»(x) 
Jx 

< \\x\u f \\f(x)\\M*)- 

Jx 



This implies that 
(120.3) 



f(x) dp,(x) 



x 



< 



\f(x)\\ d(i(x), 



x 



by the Hahn-Banach theorem. The same conclusion could also be obtained by 
approximating the integral by finite sums. 
Similarly, 

(120.4) X(E(f \ B)) = E(X(f) \ B), 
and hence 

(120.5) \X(E(f | B))\ < E(\X(f)\ | B) < \\\\\.E(\\f\\ I B). 
This implies that 

(120.6) \\E(f | B)\\ < E(\\f\\ | B). 

More precisely, if ([120.51) holds at some point x £ X for every linear functional 
A on V, then (1120.6[) also holds at x, by the Hahn-Banach theorem. This works 
as well when (I120.5|) holds for every A in a dense subset of 



(120.7) 



{Ae V* : ||A|U = 1}. 



Because V and hence V* are finite-dimensional, there is a countable dense set 
in (|120.7|) . If (|120.5|) holds almost everywhere on X for each A £ V* , then it 
holds simultaneously for a countable set of A's almost everywhere on X . This 
implies that (|120.6[) holds almost everywhere on X, as desired. 



121 Martingales and products 

Let (X, A, /it), (Y,B,u) be probability spaces, and suppose that their Cartesian 
product X x Y is equipped with the product probability measure fi x v. Also 
let B\ C B 2 Q ■ ■ ■ be an increasing sequence of er-subalgebras of A, and let 
Bj be the a-subalgebra of the cr-algebra of measurable subsets of X x Y that 
corresponds to Bj on X and B on Y in the product space. As before, a function 
F(x,y) £ L P (X x Y), 1 < p < oo, may be considered as representing an LP 
function on X with values in LP (Y ) , and thus a martingale on X x Y with respect 
to Bj may be considered as representing a type of vector-valued martingale on 
X with respect to Bj. 

If F(x, y) £ L 1 {X x Y), then put 

(121.1) I Y {F){x)= j F{x,y)dv{y). 
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Let us check that 

(121.2) I Y (E XxY (F | Bj)) = E X {I Y {F) \ Bj) 

for each j, where the subscripts of E indicate the spaces on which the conditional 
expectations are taken. Both sides of the equation are measurable functions on 
X with respect to Bj, and so it is enough to show that 

(121.3) / I y (Exxy(F I Bj) d(M= [ E X (I Y (F) \ Bj) d/i 

JA JA 

for every A € Bj. Of course, 

(121.4) [ I y (Exxy(F \ Bj) d/i = [ ExxY(F\Bj)d(fiXu) 

JA JAxY 

= / Fd(fi x is) j 

JAxY 

because A x Y € Bj. Similarly, 

(121.5) / Ex(lY(F)\Bj)dfi= I I Y (F)d(*= [ Fd(vxv). 

J A JA JAxY 

Let us say that a measurable function F(x, y) € L 1 (X x Y) is nice if there 
are finitely many pairwise-disjoint measurable subsets B\ , . . . , B n of Y with 
positive measure such that F(x,y) is constant in y on for k = 1, ...,n. If 
4>k{x) = F(x,y) when y € -B^, then 4>k(x) € -^H^O for each fe, and 

n 

(121.6) F(x,y) = J2Mx)lB k (y), 

k=l 

where is the indicator function associated to B^ on Y, equal to 1 when 

y € Bk and to when y £ Y\Bk- In this case, 

n 

(121.7) ^ xy (F|B i )(^y) = E^^I B i)( :r ) 1 ^^)' 

fe=i 

as in Section [1191 In effect, F corresponds to a function on X with values in an 
n-dimensional vector space under these conditions. 

Suppose that F(x,y) € L P (X x Y), 1 < p < oo, and put 

(121.8) N p (F)(x) = ( K J^\F{x 1 y)\ p dv{y)f /P . 
Thus N p (F) ELP(X), and 

(121.9) ( f N p (F)(x) p d^(x)) 1/P = ( f \F(x,y)\* dQi x i/)(a;,i,)) 1/P . 

K JX ' K JXxY ' 
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We would like to check that 

(121.10) N p (E XxY (F | Bj)){x) < Ex{N p (F) \ Bj)(x) 

for almost every x <E X and each j > 1. If p — 1, then Ni(F) = I Y (\F\), and 

(121.11) I y (\E Xx y(F I < M^x^l^l | B,)) 

If p > 1 and -F is nice, then (|121.10[) follows from the discussion in the preceding 
section. More precisely, one can take V to be the n-dimensional vector space 
spanned by 1b 1} . . . , ls n , equipped with the L P (Y) norm. Otherwise, one can 
approximate F by nice functions in L P (X x Y). 

If {Fj}°^ 1 is a martingale on X x Y with respect to the Bj's such that 
6 L P (X x F) for each j, 1 < p < oo, then it follows that {^(Fj)}^ is 
a submartingale on X with respect to the Bj's. This leads to the same type 
of maximal function estimates as before. If {Fj}°^ 1 converges in L P (X x Y), 
then one may conclude that {Fj(x, ')}^Li converges in L P (Y) for almost every 
x E X. In particular, this holds when 1 < p < oo and the norm of Fj(x, y) in 
L P (X x Y) is uniformly bounded in j. 

Instead of (|121.10[) . it is easier to show that 

(121.12) Iy(\E XxY (F I B )\ p ) < E X {I Y {\F\ P ) \ B 3 ) 
almost everywhere on X. As in the p = 1 case, one has that 

(121.13) I y (\E X xy(F I Bj)\ p ) < I Y (E XxY (\F\ p \ Bj)) 

= E X (I Y (\F\ P ) | Bj) 

when p > 1. If {Fj}J!L 1 is a martingale onlxy with respect to the Bj's such 
that Fj £ L P (X x Y) for each j, then this implies the less precise statement that 
I Y (\Fj\ p ) = N p (Fj) p is a submartingale on X with respect to the Bj's. One can 
still get some maximal function estimates from this, which are adequate for the 
same conclusions about pointwise convergence. 

If Y' is a cr-finite measure space, then one can choose a positive weight on 
Y' to get a probability measure, as in Section [1091 This permits one to identify 
L P (Y') with L P (Y) for a probability space Y, as before. Thus martingales on 
X with values in L P (Y' ) can be identified with martingales on X with values in 
L P (Y), to which the discussion in this section applies. 

122 ^-Valued martingales 

Let (X, A, /x) be a probability space, and let {fi(x)}f^i be a sequence of real or 
complex-valued functions on X such that fi(x) € L P (X) for each I, 1 < p < oo, 
and 

oo „ 

(122.1) V / |/,(x)|"dA*(x) <oo. 
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This is the same as 

p oo 

(122.2) / J2\fi(x)\ p dn(x)<oo, 

Jx 1=1 

which implies that {fi{x)}fl 1 £ ^ p — ^ P (Z+) for almost every x € X. One can 
also think of {fi(x)}fZi as an element of L P (X x Z+), where X x Z+ is equipped 
with the product measure associated to counting measure on Z+. 

If B is a cr-subalgebra of A, then of course one can take the conditional 
expectation E(fi \ B) of /; for each I, and 



(122.3) / \E(fi\B)\*dii(x)< / E(\f l \ p \B)d f ,(x) = / \fi\r dp. 
Jx Jx Jx 

Hence 

oo „ OO „ 

(122.4) \E(fi\B)\ P d^<^2 \fi\ P d»- 
i=i Jx i=i Jx 

This is another way to look at conditional expectation of £ p -valued functions, 
which is consistent with the earlier discussions. 
More precisely, 

(122.5) \E(fi | B)\ p < E(\fi\ p | B) 

almost everywhere on X for each /, and so 

oo oo oo 

(122.6) ]T \E{h | B)\ p <Y E (\fi\ P I B) = E^\fi\ P I B 
i=i i=i i=i 

almost everywhere on X . As in Section [1201 

(122.7) (E™|B)I P ) <^((El^l P ) \ B 

i=i i=i 

almost everywhere on X for each n G Z+. This implies that 

n j / oo ^ , 

(122.8) (E™is)l p ) P <£((£Lftl p ) |b 

almost everywhere on X for each n, and thus 

oo . , oo ^ , 

(122.9) I B)l p ) P <£;((^|/ ; r) P \B 
i=i i=i 



As in Section [1091 one can choose a positive weight on Z + to identify l p with 
L p (y), where Y is a probability space. Thus the estimates in the preceding 
paragraph can be seen as a special case of those in the previous section, with 
simplifications from the discreteness of Y. As before, one can get submartingales 
from the norms of £ p -valued martingales, and then maximal function estimates 
for these. In particular, it follows that an ^-valued martingale with bounded 
L 1 norm converges almost everywhere, as in Section 111 II 
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123 Approximation in product spaces 



Let (Xi,Ai,fXi), (X 2 ,A 2 , fX2) be measure spaces with (j,i(Xx), p; 2 (X 2 ) < 00, 
and consider their Cartesian product X\ x X 2 . The cr-algebra A of measurable 
subsets of X is defined as the smallest cr-algebra of subsets of X that contains the 
measurable rectangles A\ x A 2) A\ £ Ai, A2 £ A 2 . Note that the intersection 
of two measurable rectangles in X is also a measurable rectangle, and that the 
complement of a measurable rectangle is the union of three pairwise-disjoint 
measurable rectangles, since 



(123.1) {X l x X 2 )\(A 1 x A 2 ) = 

{{X X \A X ) x A 2 ) U {Ax x {X 2 \A 2 )) U ((*i\Ai) * (X 2 \A 2 )). 



Let £ be the collection of subsets of X that can be expressed as the union 
of finitely many pairwise-disjoint measurable rectangles. This is an algebra of 
subsets of X, by the previous observations. Also let fj, = /xi X fx 2 be the product 
measure associated to jjL\, [i 2 on A. If 



is the corresponding semimetric on A as in Section [791 then £ is dense in A 
with respect to d(A, B). Depending on the way that the product measure is 
defined, this may be obvious from the construction. At any rate, this follows 
from the discussion in Section 1791 which implies that the closure £ of £ in A is 
a CT-subalgebra of A that contains £. One could also use the characterization 
of A as the smallest monotone class of subsets of X that contains £ . If X x , X 2 
are cr-finite and A C X is a measurable set with fJ-(A) < 00, then one can first 
approximate A by subsets of products of measurable sets with finite measure, 
and then continue as before to approximate A by elements of £. Using these 
approximations, one can check that nice functions are dense in L P (X) when 
1 < p < 00, as in Section 11211 Of course, these statements are much simpler 
when Xi or X 2 has only finitely or countably many elements and all of its 
subsets are measurable, or when Ai or A 2 is generated by a partition of the 
corresponding space into finitely or countably many subsets. 

124 Mixed norms 

Let (X, A, (J,), (Y,B,v) be probability spaces, and let their Cartesian product 
X x Y be equipped with the product measure fj, x 1/, as usual. Consider the 
space of real or complex-valued measurable functions F{x,y) on X x Y such 
that 



is finite, where 1 < p < 00. It is easy to see that this is a vector space, and that 
P24.1[) becomes a norm on this space when we identify functions that are equal 



(123.2) 



d(A,B) = n(A A B) 



(124.1) 
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almost everywhere. If F(x,y) € L P (X x Y), then F(x,y) is in this space, and 
< (/ \F(x,v)\*d(nx V )(x,yj)' 

K 'XxY 



(124.2) / (/ \F{x,y)\v dv{y)) 1/P dn{x) 



i/p 



by Fubini's theorem and Jensen's inequality. Similarly, if F(x, y) is in this space, 
then F(x,y) G L l (X x Y), and 

(124.3) f \F(x,y)\d(iiXv)(x,y) 

J XxY 

1/p 



< 



X X JY 



again by Fubini's theorem and Jensen's inequality. 
Suppose that F{x 1 y) is in this space, and put 

(124.4) N p {F){x) = ( f \F(x,y)\* dv{y)) 



i/p 



as in Section [T2T] Thus (|124.1I) is the same as the V-{X) norm of N p {F)(x). If 
L > 0, then define F L (x, y) on X x Y by 

(124.5) iM^y) = F(x.y) when N p {F)(x) < L, 

= when N p (F)(x) > L. 

In particular, N p (F L )(x) = N p (F)(x) when N p (F)(x) < L, and N p (F L )(x) = 
when N p (F)(x) > L. It follows that F L (x,y) G L p pf x Y) for each L, and that 
FL(x,y) converges to F(x,y) with respect to the norm (|124.1j) as L -> oo, so 
that L P (X x Y) is a dense linear subspace of this space. 

Let {Fj(x, be a sequence of measurable functions on X x Y. By 

Fatou's lemma, 

(124.6) / liminf |F i (a;,y)| p ^(2/) < liminf / (i^x, di/(y) 

J Y j-KX> j^oo J Y 

for every x £ X. Equivalently, 

. x ' F f {.r. II ) j ,/„(,/)'' '" 



(124.7) (y (liminflF^x.j/)!) ^(y)) 

< liminf ( f iFjfayWMvj) 



for each id. Applying Fatou's lemma a second time, we get that 

, p \i/p 

linf \F 1 {x,y)\ 

>x k Jy v 3- 



(124.8) ^ (y (liminf {F^y)^ du(yfj d^x) 

< liminf / (/ |F ? (x, 2 /)r^(y)) 1/P dM^)- 
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Suppose that {Fj(x, y)}jLi converges almost everywhere to F(x, y) on Ax Y. 
It follows that for almost every x G A, {Fj(x,y)}°^ 1 converges to F(x,y) for 
almost every y G Y. Hence 

(124.9) f \F(x,y)\P du(y) < liminf f \F 3 (x,y)\ p du(y) 
Jy Jy 

for almost every x G X . This implies that 

(124.10) f (/ \F(x, y)\* dv{yj) llP dn{x) 

< liminf f ( f {Fj&y)? dv{y)) l,V d^x), 
3^°° Jx k Jy ' 

as before. 

125 Mixed-norm martingales 

Let (X,A,fi), (Y,B,v) be probability spaces, and let X x Y be equipped with 
the product measure /i x v. Also let Bi C B2 C • • ■ be an increasing sequence 
of (T-subalgebras of A, and let Bj be the cr-algebra of subsets of X x Y that 
corresponds to Bj on X and B on Y in the product space. Suppose that F (x, y) 
is a measurable function on X x Y for which (|124.ip is finite, 1 < p < 00. In 
particular, F(x,y) el'fXx Y~), and so 

(125.1) Fj - £(F I Bj) 

defines a martingale on X x Y with respect to £>j . 
As in Section [T2T1 

(125.2) Np(Fj) < E X (N P (F) | Bj) 

almost everywhere on X for each j > 1, where the subscript A of £7 indicates 
that the conditional expectation is taken on X. More precisely, this is the same 
as (|121.10p when F(x,y) £ L P (X x Y), and otherwise we can approximate 
F(x,y) by elements of L P (X x Y) with respect to the norm (1124. ip . as in the 
previous section. Integrating (|125.2[) over A, we get that 

(125.3) f N p {F j )d f i< [ E X (N P (F) \Bj)dfi= [ N p {F)d^ 
Jx Jx Jx 

for each j. Thus the norm of Fj with respect to (|124.1|) is less than or equal to 
(|124.ip for each j. 

One can also check that {Fj}°°^± converges to F with respect to the norm 
(|124.ip . If F G L p (X x Y), then converges to F with respect to the L p 

norm, and hence with respect to (1124. 1[) . Otherwise, one can approximate F 
by elements of L P (X x Y), using the uniform bound for the norm of Fj in the 
previous paragraph. 
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If we apply (|125.2j) to Fj+\ instead of Fj, then we get that 



(125.4) N p (Fj) < E x (N p (F J+1 ) \ Bj) 

almost everywhere on X for each j > 1. Thus {N p (Fj)}°^ 1 is a submartingale 
on X with respect to the Bj's, which leads to maximal function estimates as 
before. Using convergence of {Fj}°°^ 1 to F with respect to the norm (1124. ip . 
one can show that {Fj(x, OlSi converges to F(x, ■) in L P (Y) for almost every 
x £ X. This is basically the same as in the previous situations, once we have 
the same ingredients as before. 

126 Mixed-norm convergence 

Let (X,A,fi), (Y,B,u) be probability spaces, and let X x Y be equipped with 
fix v, as usual. ^Aiso let B\ C B 2 Q • ■ • be an increasing sequence of a-subalgebras 
of A, and let Bj be the cr-algebra of subsets ofIx7 that corresponds to Bj 
on X and B on Y in the product space. Suppose that {Fj}°Z 1 is a martingale 

on X x Y with respect to the Bj's whose norms as in (|124.1[) are uniformly 
bounded for some p > 1 . Equivalently, 

(126.1) / N p {F 3 )dfi<C 

J x 

for some C > and each j. Note that {N p (Fj)}°^ 1 is a submartingale on X 
with respect to the Bj's, as in (|125.4I) . 

Suppose in addition that {N p (Fj)}j^_ 1 is uniformly integrable on X, as in 
Section |83l and let us check that {Fj}J^ 1 is uniformly integrable ox\ X xY . Let 
e > be given, and choose 5 > such that 

(126.2) J A N p {F 3 )dfjL< 6 - 

for every measurable set A C X with fi(A) < 6 and each j. If 

(126.3) A,- £ = {x € X : N p {Fj)(x) > L}, 
then 

(126.4) A*(^4j,l) < £ -1 C 

for each j, L, by Tchebychev's inequality. Hence fi(Ajx) < $ for each j when 
L is sufficiently large, which implies that 

(126.5) f \Fj\d(ji x v) < [ N p (Fj)dfi < t 
Ja JiL xy Ja JiL 2 

for each j when L is sufficiently large. On the complement of Aj t L x Y, we have 
that 

(126.6) f \Fj\ p d(fi X v) = f N p (Fj) p dfi < L p 

J(X\A ]tL )xY Jx\Aj, L 
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for each j, L, by the definition of A,,^. Let q be the exponent conjugate to p, 
so that l/p+ 1/q = 1. If B C X x F is measurable, then 

(126.7) / \F j \d{ }i ^v)<{{ Ji > l v){B)f^(f \F 3 \v d(p x v)Y /P , 

J B J B 

by Holder's inequality. If B C (X\Aj i j,) x F, then it follows that 

(126.8) / \Fj\d{fi X v) < L{{ji X v){B)) 1/q . 

JB 

In order to show that {Fj}J^ 1 is uniformly integrable, one can combine this with 
the earlier estimate for the integral of \Fj\ over A^l x Y when L is sufficiently 
large. 

If {Fj}JL 1 is uniformly integrable on X x F, then {i^-j^j converges in 

L l (X x F) to a function F, and Fj = E(F \ Bj) for each j. Moreover, {Fj}JL 1 
converges to F almost everywhere on X x F, which implies that the norm of F 
with respect to (1124. 1[) is also finite, as in Section ["1241 Thus we are back in the 
situation of the preceding section. This implies that {Fj}°^ 1 also converges to 
F with respect to the norm (|124.1I) . and that {Fj(x, -)}jS=i converges to F(x, •) 
in L P (Y) for almost every x 6 X . 

Suppose now that {N p (Fj)}°^ 1 is still bounded in L 1 (X), but may not be 
uniformly integrable. Because {N p (Fj)}°^L 1 is a submartingale with respect to 
the Bj's, the corresponding maximal function can be estimated in the usual way. 
In this case, {Fj}jL 1 can be approximated by martingales {Gj}^ 1 on X x F 
such that {N p (Gj)}°^ 1 is uniformly integrable, as in Section[Sni More precisely, 
the approximation basically takes place in the x variable. This permits one to 
show that {Fj(x, Olfci converges in L P {Y) for almost every x £ X, as before. 

127 The £ p version 

Let (X, A, n) be a probability space, and let 1 < p < oo be given. If {fi(x)}f^ =1 
is a sequence of real or complex- valued measurable functions on X such that 

(127.1) / (V|/*(z)l P ) 

Jx i=i 

is finite, then 

oo 

(127.2) 53|/,(x)|"<oo 

for almost every x £ X. It is easy to see that the space of sequences of functions 
on X of this type is a vector space, and that (I127.1[) defines a norm on this vector 
space when we identify functions that are equal almost everywhere on X. We 
can also use a weight on the set of positive integers to identify £ p with L P (Y) 
for a probability space F, so that this expression is the same as (|124.1[) . 
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If {fi(x)}fl x is a sequence of functions in L P (X) such that 

oo „ „ oo 

(127.3) W \fi(x)\>dn(x)= ^2\fi(x)\v dfl{x) < 00j 
then (|127.1I) is also finite, because 

« OO ^ j « OO ^ / 

(127.4) / (J2\hW\ P ) P Mx)<( P , 

by Jensen's inequality. These sequences of functions are dense among those for 
which (I127.ip is finite, with respect to the norm (|127.1[) , for the same reasons as 
in Section [1241 Of course, these two conditions on sequences of functions on X 
are the same when p = 1. Alternatively, if {fi(x)}f^ 1 is a sequence of functions 
on X for which (|127.ip is finite, then 

(127.5) lim / (J2\M*)\ P ) P d^(x)=0, 

1= n 

by the dominated convergence theorem. This implies that {fi(x)}f^ 1 can be 
approximated by sequences of functions for which all but finitely many terms 
are equal to with respect to the norm (1127. ip . 

Let f(x) = {fi(x)}'^ 1 be a sequence of measurable functions on X for which 
(|127.ip is finite, and let B be a cr-subalgebra of A. As in Sections 11201 and 11221 

(127.6) (V|£u ; |z?)r) p <E((j2\fi\ p ) p \b 



1=1 1=1 
almost everywhere on X for each n, and hence 

(127.7) (£W'|B)I P ) <^((El/'l P ) "IB 

i=i i=i 

almost everywhere on X. In particular, 

(j2\ E (fl\B)\ P ) < / ^((El/H P ) P \B)dn 

- [ i=i Jx i=i 

= /(X>i')"V 



1=1 



Now let Si C 82 C • • • be an increasing sequence of er-subalgebras of A. 
Also let fj(x) = {fjjix)}^ be a sequence of measurable functions with respect 
to Bj for which (|127.ip is finite for each j, and put 

(127.9) A 3 = / (E|/^(^)| P ) "^)- 
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Suppose that {fj,i}JL\ is a martingale with respect to this filtration for each I, 
so that fj.j = E(fj + i t i | Bj) for each j, I > 1. Thus 



oo 

(127.10) Wf j (x)\\ P =(j2\M^)\ P ) 



i/p 



is a submartingale with respect to this filtration, as in the previous paragraph. 
This implies that {Aj}°Z 1 is monotone increasing, as usual. Similarly, if 

(Ei/^)r) v M*i 
- [ 1=1 

then Aj^ n < Aj + i t „ for each j, n > 1. Note that 

(127.12) lim A jn = A,- 

for each j, by the dominated convergence theorem. 
Suppose that the Aj's are bounded, and put 

(127.13) A = sup A,-. 

i>i 

Let 5 > be given, and choose jo such that 

(127.14) A jo >A-6. 

Because Aj 0>n — > Aj as n — > oo, we can choose no so that 

(127.15) A jotno >A-S. 
If j > jo > then monotonicity implies that 

(127.16) A^ no >A-S. 

Let us pause a moment to record some elementary inequalities that will be 
helpful later. If a, b > 0, then 

(127.17) (a + b) 1/p > a 1/p + p' 1 {a + 6)(Vp)-i b . 
This follows from calculus, because 

(127.18) j t {a + t) 1/p = p^ 1 {a + *)(Vp)-i 

is minimized on [0, 6] at t = b. Remember that < 1/p < 1, because 1 < p < oo. 
If b > e a for some e > 0, then a + b < (e _1 + 1) b, and so 

(127.19) (a + fe) 1 ^ > a 1/p + p' 1 (e^ 1 + 1)(Vp)-i &Vp. 
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This implies that 

(127.20) < e 1/p a 1/p + p (e" 1 + l) 1 ^*/*-) ((a + b) 1 ^ - a 1 ^), 

for every e > 0. More precisely, b 1 ^ is less than or equal to the second term 
on the right when b > e a, by the previous inequality, and otherwise b l ' v is less 
than or equal to the first term on the right, because b < ea. Note that (|127.20|) 
also holds when a = or b = 0. 
Let us apply (|127.20[) to 

n oo 

(127.21) a = J2\hl(x)\ P , b= ]T \fM\ P , 

1 = 1 l=n+l 

using also the fact that a 1 / p < (a + b) x / p — \\fj(x)\\ p . This implies that 

(127.22) ( £ \fM\") " 

l=n+l 

< ^ 1/p ii/ j (^)IIp+p(^ 1 + i) 1 - (1/p) (ii/ j wiIp-(Ei/^^)i p ) 

i=i 

Integrating over X, we get that 

( E \M*)\ P ) P Mx) 

X l=n+l 

< e x ' p A + p (e- 1 + lji-CVrt ( A - A^ n ), 

using also the fact that Aj < A for each j, by the definition of A. Taking 
n = no, we get that 



/oo ^ , 

( E i/^r) ^mw^^+m^ + i) 1 ^ 1 



Ms 



when j > jo- 

If 77 > is given, then we can first choose e > so that e 1 ' p A < rj/2, and 
then choose 8 depending on e such that p (e _1 + l) 1 ^ 1 /?) $ < -q/2. If jo, ™o are 
as before, then (|127.24l) implies that 

( E P da(x)< V 
- [ l=n +l 

when j > jo. The integral on the left side of (|127.25[) is actually monotone 
increasing in j, for the usual submartingale reasons, which implies that ()127.25|) 
holds for every j. Put gj,i{x) — fj,i{x) when I < no and gj,i(x) = when 
I > no, so that {gj,i}"jLi is a martingale with respect to the Bj's for each I, 
and /j(x) = {/j,;(a^)}^i is approximated by gj(x) = {gj,i{x)}f^ 1 uniformly in 
j with respect to the norm ([127. ip , by (|127.25[) . Using this approximation and 
maximal function estimates for ||/j(x) — gj(x)\\ p , one can show that {fj{x)}J^ 1 
converges in t p for almost every x <E X, as in Sections II 101 and lllll 
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128 The doubling condition 



Let [X,A, /i) be a probability space, and let Vo,Vx,V2, . . . be a sequence of 
partitions of X into finitely many measurable sets of positive measure such that 
'Pj+i is a refinement of Vj for each j and Vq is the trivial partition consisting 
of only X itself. We say that the Vj 's satisfy a doubling condition if there is a 
C > 1 such that 

(128.1) n(A)<Cn(B) 

when 4 e 'Pj. Be Vj+i, and B C A. This implies that for each A G Vj there 
are less than or equal to C sets B £ Vj+\ such that B C A. In particular, this 
implies that Vj has less than or equal to elements for each j. If X = [0, 1) 
is equipped with Lebesgue measure and 'Pj consists of the dyadic subintervals 
of [0, 1) with length 2~* , then (|128.1[) holds with C = 2. 

Let Bj = B(Vj) be the er-subalgebra of ^4 generated by Vj, as in Section [771 
Thus Bj C Bj+i for each j, since 7-j+i is supposed to be a refinement of Vj. 
If fj+i(x) is a nonnegative real- valued function on X which is measurable with 
respect to Bj+% for some j > 0, then 

(128.2) f j+1 <CE{fj+x | Bj). 

If {fj}°^ is a martingale with respect to this filtration consisting of nonnegative 
real- valued functions, then 

(128.3) f j+ i<Cfj 

for each j. 

Suppose now that {0j}^ o ^ s a submartingale on X with respect to this 
filtration consisting of nonnegative real-valued functions, and put 

(128.4) ft = E{4>j +1 | Bj). 
Thus 

(128.5) 4> 3 < ft 

for each j > 0, because {^j^fLi is a submartingale. Hence 

(128.6) ft = E(<p j+1 | Bj) < E(ft +1 | Bj), 

which implies that {V'jl^o ^ s a ^ so a submartingale. The doubling condition 
implies that 

(128.7) (f> j+1 <Cft 

for each j > 0, as in (|128.2p . If the ^ 's have bounded L p norm for some p > 1, 
then the i/'j's have bounded i p norm as well, and with the same bound. 

Let V be a real or complex vector space with a norm \\v\\, and let {fj{x)}°^ Q 
be a V- valued martingale on X with respect to the £>j's, as in Section ll03l Thus 
4>j{x) — \\fj(x)\\ is a nonnegative real-valued submartingale on X, and ipj(x) 
can be defined as in the previous paragraph. Note that fo(x) is constant on X , 
and let t > ||/o(^)|| be given. Put t(x) — oo when ifij(x) < t for each j > 0, and 
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otherwise let r(x) be the smallest nonnegative integer I such that "4>i{x) > t. 
This is a stopping time, as in Section l96l If r n {x) = min(r(x), n), then 

(128.8) g n (x) =f Tn{x ){x) 

is also a V- valued martingale on X, as before. This is basically the same as the 
approximation to {/j(x)}°^ described in Section [55J except that we use the 
maximal function associated to ijij instead of 4>j ■ By construction, 

(128.9) ||.g„(x)|| = ||/ n (x)|| < j, n (x) < t 
when n < t(x), and 

(128.10) \\g n (x)\\ = \\f T{x) (x)\\ < CtP^.^x) < Ct 

when < t{x) < n, because of the doubling condition. It follows that 

(128.11) \\9n(x)\\ < Ct 

for every x € X and n > 0, since ||g„(a;)|| = ||/o(^)|| < t when t(x) = 0, 
by hypothesis. This is analogous to (|85.16l) . with the integrable function h(x) 
replaced by Ct. If <fij(x) = has bounded L 1 norm, so that ipj(x) has 

bounded L 1 norm too, then the measure of the set where t(x) < oo can be 
estimated as before. Of course, g n {x) = f n {x) for every n > when t(x) = 
oo. If every uniformly bounded V- valued martingale on X converges almost 
everywhere, then every F-valued martingale {fj(x)}jL 1 such that ||/j(x)|| has 
bounded L 1 norm also converges almost everywhere, as in Section [55] 



129 Paths and martingales 

Let V be a real or complex vector space with a norm ||u||, and let F be a 
V- valued function on [0,1]. If [a,b) is a dyadic subinterval of [0,1) of length 
b — a = 2~i , then put 

(129.1) f j {x) = 2*(F(b)-F(a)) 

for every x € [a, b). This defines fj(x) as a F-valued function on [0,1) which 
is constant on the dyadic intervals of length 2~ J . It is easy to see that the //s 
form a V- valued martingale on [0,1) with respect to Lebesgue measure and the 
c-subalgebras of measurable sets generated by the partitions of [0, 1) by dyadic 
intervals of length 2 _J , as in Section [1031 Note that 

1 2^ — 1 

(129.2) / \\f j (x)\\dx='£}\F((l + l)2-i)-F(l2-j)\\. 
Jo 1=0 

If F : [0, 1] -> V has finite length A, then 

(129.3) / ||/j(x)|| dx < A 

Jo 
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for each j. If F is Lipschitz, then the fj's are uniformly bounded. If F is 
diffcrcntiable at x, then 

(129.4) lim fj(x) = F'{x). 

If V = L 1 ([0,1]) and F{x) is the indicator function of [0,x], then F is 
a Lipschitz function on [0,1] with values in i 1 ([0,l]), as in Section [5TJ The 
corresponding martingale is the same as the one described in Section 11001 

Now let V — L°°(R), and let <j> be a real or complex- valued Lipschitz function 
on the real line. Also let <p x be the translate of <f> by x, so that 4> x {v) = 4>{y—x)- If 
4> is bounded, then F(x) — <p x defines a Lipschitz mapping from R into L°°(R), 
as in Section [5T1 Otherwise, F(x) — <p x — <p defines a Lipschitz mapping from R 
into L°°(R), using only the hypothesis that <f> is Lipschitz on R. The restriction 
of F(x) to a; £ [0, 1] defines a martingale {fj}j with values in L°°(R) as before. 
If (f> is continuously-differentiable with uniformly continuous derivative, then 
F is differentiable at every x £ R as an L°° (R)-valued function on R. In 
this case, the derivative of F at x corresponds to — 1 times the derivative of <f> 
translated by x. If a; £ [0, 1), then it is easy to see that {fj(x)}°^ 1 converges to 
the same limit in L°°(R). Conversely, if {/j(x)}°^ 1 converges in L°°(R) for any 
x £ [0,1), then one can show that (f> is continuously differentiable with uniformly 
continuous derivative. This is analogous to the fact that <p is continuously- 
differentiable with uniformly continuous derivative when F is differentiable at 
a single point, but slightly more complicated, since we are only using "dyadic" 
difference quotients of F. If {fj(x)}J^ 1 converges in L°°(R) for some x £ [0, 1), 
then the limit determines a bounded uniformly continuous function ip on R, 
because fj (x) corresponds to a bounded Lipschitz function on R for each j that 
converges uniformly on R as j — > oo. One can check that ip = — (j)' x where 4> x is 
differentiable, and then use the fact that Lipschitz functions are differentiable 
almost everywhere and can be represented by integrals of their derivatives to 
show that 4> x is continuously diffcrcntiable with derivative —ip. Alternatively, 
one can argue that <f>' x — —ip in the sense of distributions, and hence that <fi x is 
continuously differentiable with derivative —ip. 

Of course, one can just as well take V to be the space Cb(R) of bounded 
continuous functions on the real line with the supremum norm here, which can 
be identified with a closed linear subspace of L°°(R). There is also a simple 
way to embed C&(R) linearly and isometrically into £°°, by restricting a bounded 
continuous function on the real line to the rationals, and then enumerating the 
latter by a sequence to get bounded sequences of real or complex numbers. If (j> 
has compact support, then one can view to restriction of F{x) to x £ [0, 1] as a 
Lipschitz mapping into the space of continuous functions on a sufficiently large 
closed interval in the real line. 
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130 L°° Norms 

Let (Y, B, v) be a probability space, and suppose that g E L°°(Y), so that 

(130.1) \\g\\ P =(J Y \g(y)\ p dv(y)) 1/P <\\g\\oc 

for each p < oo. Of course, \\g\\ p is monotone increasing in p, by Jensen's 
inequality, and it is well known and not difficult to show that 

(130.2) lim ||s||p = ||ff||oo. 

p— ► oo 

Similarly, if g is a measurable function on Y that is not essentially bounded, 
and if g E L' P (Y) for each p < oo, then ||<7|| p — ► oo as p — >• oo. 

Let (X, A, v) be another probability space, and consider their Cartesian 
product X x Y, equipped with the product measure p, x v. If F{x,y) is a 
measurable function on X x Y, then 

(130.3) N oc (F){x)= lim jV p (F)(x) = lim ( / \F(x, y)\ p dv{y)\ VP 

p— ►OO p— YOO \ / 

is a convenient way to express the norm of F(x, y) as a function of y in L°°(Y) 
for each x € X. More precisely, it is often helpful to restrict p to be a positive 
integer here, so that N 00 (f)(x) is expressed as the limit of a monotone increasing 
sequence of functions. This makes it easy to derive properties of N 00 (F)(x) like 
those for N p (F)(x) when p < oo discussed earlier. 

If f(x) = {fi(x)}f2. 1 is a sequence of real or complex-valued measurable 
functions on X, then the £°° norm of f(x) can be expressed as 

(130.4) 11/(1)1100 = sup |/i(a;) | = lim max |/,(a:)|, 

l>\ n— >oo l<i<n 

which implies that ||/(x)||oo is measurable on X. If ||/(a;)||oo is integrable on X, 
then it is very easy to see that the £°° norm of the conditional expectation of the 
fl(x)'s with respect to some a-subalgebra of A is bounded by the conditional 
expectation of ||/(x)||oo- One can simply use the fact that |/z(x)| < ||/(x)||oc 
for each / to get that the conditional expectation of fi(x) is bounded by the 
conditional expectation of ||/(a;)||oo for each /, and then take the supremum 
over /. 

131 Paths and measures 

Let (F, ||u||) be a real or complex Banach space, and let F : [a, b] — > V be a path 
of finite length. As in Section 05l the one-sided limit F(x+) = lim y ^ x+ F(y) 
exists for every x G [a, b), and similarly F(x—) — lim y ^ x - F(y) exists for every 
x E (a, b]. We can extend F to the whole real line by putting F(x) = F(a) when 
x < a and F[x) = F(b) when x > b, so that F(o-) = F{a) and F{b+) = F{b). 



173 



As in Section we can put 



(131.1) i/((r,t)) =F(t-)-F(r+) 
when a < r < t < b, and 

(131.2) v([r,t]) = F(jb+) - F(r-) 
when a < r < t < b. Similarly, we can put 

(131.3) i/([r, *)) = F(t~) - F(t+), i/((r, t}) = F(t+) - F(r-) 

when a < r < t < b. This determines a finitely-additive T^-valued measure on 
the algebra £ of subsets of [a, b] that can be expressed as the union of finitely 
many intervals, where the intervals may be open, closed, or half-open and half- 
closed. Of course, this is a bit simpler when F is continuous. 

Let a(x) be the length of the restriction of F to [a, x] when a < x < b. 
This can be extended to all x £ R by setting a(x) = when x < a and 
a(x) = a(b) when x > b. Thus a(x) is a monotone increasing function on R, 
which determines a nonnegative Borel measure fi on R as in Section 1441 It is 
easy to see that 

(131.4) \\u(A)\\<n(A) 
for every A G £, because 

(131.5) \\F(t) -F[r) || <a{t)-a{r) 

when r < t. Note that a(t) — a(r) is the same as the length of the restriction 
of F to [r, t] when r < t, as in Section I4T1 

Let B be the cr-algebra of Borel subsets of [a, b]. Thus £ C B, and B is the 
smallest cr-algebra of subsets of [a, b] that contains £. If d(A, B) = fi(A A £?) is 
the distance between A,B £ B associated to ^ as in Section [79l then it follows 
that the closure of £ in B with respect to d(A, B) is equal to B. This can also 
be seen more directly from the construction of fx. 

If A, B e £, then 

(131.6) u(A)-u(B) = (u(A\B) + u(AnB))-(v(B\A)-v(AnB)) 

= v{A\B)-v{B\A), 

and hence 



(131.7) \\v(A)-v{B)\\ < MA\B)\\ + \\v(B\A)\\ 

< f i(A\B)+ f x(B\A)=d(A,B), 

by (|131.4p . This permits v to be extended to a V- valued function on B, using 
uniform continuity and completeness. More precisely, if A G B, then there 
is a sequence {Aj}°° =1 of elements of £ that converges to A with respect to 
d(A,B). This implies that {v(Aj)}'*L 1 is a Cauchy sequence in V, because of 
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the uniform continuity of v with respect to d(-,-) just established. It follows 
that {v(Aj)}°^ 1 converges in V, because V is complete, and v (A) is defined to 
be the limit of this sequence. One can also check that this does not depend on 
the particular sequence {Aj}°° =1 converging to A, using the uniform continuity 
of v with respect to <!(■,■) again. Note that this extension satisfies 



(131. 



\v{A)- v {B)\\<d{A,B) 



for every A, B £ B, since this holds when A, B € £ and is preserved under 
limits. In particular, (|131.4p holds for every A £ B. 

Let A, B G B be given, and let {Aj}°Z 1: {Bj}° c L 1 be sequences of elements 
of B that converge to A, B with respect to d(-, •), respectively. This implies that 
{Aj n Bj}f =1 converges to A n B, and that {A, U Bj}^ converges toiUB, 
as in Section [751 Of course, 



(131.9) 



v{Aj) + v(Bj) = u(Aj n Bj) + v(Aj U Bj) 



for each j, because v is finitely additive on £ . Taking the limit as j — > oo, we 
get that 

(131.10) u(A) + v{B) = v{A n B) + v(A U B), 

because of (|131.8|) . This shows that v is finitely additive on B. 

If Ei, E2, • • • is a sequence of elements of B that are pairwise-disjoint, then 



(131.11) 



J2\W(E l )\\<J2f,(E l ) = J[jE l 



1=1 



1=1 



since (I131.4[) holds for every A E B. Moreover, 



(131.12) 

This implies that 
(131.13) 



OO 

v( (J E l 

l=n+l 



< 



u ^ 



l=n+l 



E 

1=1 



1=1 



as n — > 00. 



00 



1=1 



because we already know that v is finitely additive on B. 



132 Paths and integrals 

Let (V, \\v\\) be a real or complex Banach space, and let F : [a, b] ~ > V be a path 
of finite length, as in the previous section. Also let be a continuous real or 
complex- valued function on [a, b], as appropriate. Suppose that V = {£j}™ = o is 
a partition of [a, b], and that tj-i < rj < tj for j = 1, . . . , n, and consider 

n 

(132.1) ^(r.Hi^O-Ffe-x)). 
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This is an approximation to the Riemann-Stieltjes integral of <j> with respect to 
F, whose existence and basic properties will be discussed now. Basically, this 
is very similar to the Riemann-Stieltjes integral of a continuous function with 
respect to a real or complex- valued function of bounded variation on [a, b]. 

If tj-i < r'j < tj is another collection of intermediate points, then the 
difference of the corresponding sums can be expressed as 



(132.2) 



(r 3 ) (F^) F(t^i)) - ^' 3 ) (F( tj ) - Ffa-i)) 



= E^( r j)-^))( F (*i)- F (*i-i))- 

Of course, <f> is uniformly continuous on [a, b], since it is continuous and [a, b] is 
compact. Thus for each e > there is a 8 > such that 

(132.3) |0(r) - 0(r')| < e 

when r, r' £ [a, b] and \r — r'\ < 5. In particular, 



(132.4) 



< 



7=1 



|^)-0(r')|||^)-^-i)ll<eA^ 



when the mesh size maxi<j< n (t, — ij-i) of is strictly less than 8, where 
denotes the length of F on [a, 6]. 

If V, V are two partitions of [a, b] with sufficiently small mesh size, then one 
can check that the difference between the corresponding sums (|132.1[) is also 
small. As usual, it is helpful to let V be a common refinement of V and V, and 
to look at the differences between the sums corresponding ioV^V and the sum 
corresponding to V. These differences can be estimated in much the same way 
as in the previous paragraph, using the uniform continuity of 0. If V\, Vi, . . . 
is a sequence of partitions of [a, b] whose mesh sizes are converging to 0, then 
the corresponding sums form a Cauchy sequence in V, and hence converges, 
by completeness of V. The limit does not depend on the particular sequence 
of partitions, because the difference between the sums associated to partitions 
with small mesh size is small, as before. 

The Riemann-Stieltjes integral 



(132.5) 



>dF 



of (p with respect to F is the limit of the sums (|132.1j) described in the previous 
paragraph. Observe that 



(132.6) 



£< 

i=i 



fo) (Ffa) - Ffe-i)) 



< 



( sup |0(r)|) 



a: 
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for every partition V of [a, b], and hence 







1 (j)dF 


< ( sup \cf>{r)\) 


J a 


K a<r<b ' 



(132.7) 



If a(x) is the length of the restriction of F to [a, x] for each x £ [a, b], then one 
can improve this to get that 



(132. 







f <t>dF 


<-j 


J a 


J a 



|0| da, 



where the right side is a classical Riemann-Stieltjes integral. This is a more 
localized version of (|132.7p , which can be derived using the analogue of (|132. 7[) 
on small subintervals of [a,b]. As in Section 031 the Riemann-Stieltjes integral 
of a continuous function on [a, b] with respect to a can be extended to the 
Lebesgue-Stieltjes integral with respect to a positive Borel measure fi a on [a. b]. 
As usual, continuous functions on [a, b] form a dense linear subspace of i 1 (/x a ). 
Using (|132.8p . the Riemann-Stieltjes integral of <fi with respect to F can be 
extended to <fi € L 1 (/x ct ). More precisely, if cj> is an integrable function on [a, b] 
with respect to fi a , then there is a sequence {4>j}f^ 1 of continuous functions 
on [a, b] which converge to cj) in L 1 (/x Q ). Because of (|132.8[) . the corresponding 
sequence of Riemann-Stieltjes integrals of the 0j's with respect to F form a 
Cauchy sequence in V, and therefore converges, by completeness. One can also 
check that the limit depends only on (f>, and not on the particular sequence of 
continuous approximations {<pj}'jLi- Hence the Lebesgue-Stieltjes integral of 4> 
with respect to F may be defined as this limit in V. Of course, this is very 
similar to the argument in the previous section. 



133 Integrating vector measures 

Let (X, A) be a measurable space, and let (V, \\v\\) be a real or complex Banach 
space. Also let /i be a U-valued function on A such that for any sequence 
Ai,A 2 ,... of pairwise-disjoint measurable subsets of X, 

oo 

(133.1) X>(^i)ll 

3=1 

converges, and 

oo oo 

(133.2) 5>( A i) = KLMi)- 

i=i j'=i 

As in Sect ion [37l there is a nonnegative real- valued measure ||^|| on X associated 
to fj, such that 

(133.3) MA)\\ < \\n\\(A) 
for each A £ A, and ||/k||(A) < oo. 



177 



Suppose that f(x) is a real or complex- valued measurable simple function 
on X, as appropriate. This means that there are finitely many pairwise-disjoint 
measurable subsets A\,..., A n of X and real or complex numbers a±, . . . , a n 
such that 

n 

(133.4) f(x) = ^2ajl Ai (x). 

Here 1a{%) is the indicator function associated toiCIonl, equal to 1 when 
x *E A and to when x € X\A. The integral of / with respect to /i is given by 



(133.5) 

and satisfies 
(133.6) 



P n 

/ fdfx = y2a j fx(A j ), 

I fdn <J2\^\M^)\\ = I \f\d\ 
Jx j=1 Jx 



Mil 



More precisely, (|133.5|) does not depend on the particular representation (|133.4I) 
of /, and it also works when the A^s are not pairwise disjoint. 

Let f(x) be an integrable real or complex- valued function on X with respect 
to \\/j,\\, as appropriate, and let {fi}^ be a sequence of measurable simple 
functions on X that converge to / in i 1 (X, \\n\\). Using (|133.6[) , one can check 
that 



(133.7) 



fi dfJ, 



x 



i=i 



is a Cauchy sequence in V , and hence converges, by completeness. The integral 
of / with respect to [i can be defined by 



(133. 



/ d[i = lim / fi dn 



x 



x 



As usual, one can also check that this does not depend on the sequence {fi}f^ 1 
of simple functions converging to /, and that 



(133.9) 



fdfi 



x 



< 



l/MHI 



A" 



by (imp]) . 

If A is a bounded linear functional on V, then 

(133.10) n x (A) = X(ji(A)) 

defines a real or complex measure on X, as appropriate. Note that 

(133.11) MA) | = |A( M (A))| < ||A||* MA)\\ < ||A||* \\fx\\(A), 



and hence 
(133.12) 



\Hx\(A)<\\\\U\\(x\\(A) 
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for every A E A. If / is a measurable simple function on X, then it is easy to 
see that 

(133.13) \(J fdfi)= f fd»x 

for every A G V*. This also works when / e ^(X, by approximating / by 
simple functions, as in the previous paragraph. The integral of / with respect 
to /j, is uniquely determined by this property, because of the Hahn-Banach 
theorem. 



134 Measures and orthogonality 

Let (X, A) be a measurable space, and let (V, (v, w)) be a real or complex Hilbert 
space. Also let v(A) be a finitely- additive V- valued measure on (X, A) such that 

(134.1) (v(A),v(B)) = 

whenever A, B are disjoint measurable subsets of X. In particular, 

(134.2) \\ V (AUB)\\ 2 = \\is(A)\\ 2 + \\v(B)\\ 2 
when A, B are disjoint. It follows that 

n oo 2 oo 

(134.3) £IK^)II 2 + "( (J A >) = "(LM'O 



j=i 



j=n+l 



3 = 1 



for any sequence Ai,A 2 ,... of pairwise-disjoint measurable subsets of X and 
n > 1, and hence 



(134.4) 

Thus 
(134.5) 



oo 



< 



< 



oo 

K u ^) 



which implies that Y^JLi v {Aj) converges in V when A\, A 2 , . . . are disjoint. In 
this case, we ask also that 



(134.6) 

which implies that 
(134.7) 



OO OO 

5>(^) = i/(lk-), 



3 = 1 



3 = 1 



This shows that ||f (A)|| 2 is a nonnegative real- valued measure on X under these 
conditions, which may be denoted |M| 2 . 
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As a basic example of this type of situation, let /x be a nonnegative real- 
valued measure on X, and consider V = L 2 (X, /i), with the standard integral 
inner product. Let g £ L 2 (X,^i) be given, and let v g be the L 2 (X, zt)-valued 
function on A defined by 

(134.8) v g (A)=gl A . 

Equivalently, v g {A) is the function on X equal to g on A and to on X\A for 
each measurable set A C X. In particular, 

(134.9) \WM)\\ 2 = I \g\ 2 d». 

J A 

It is easy to see that v g satisfies all of the conditions described in the previous 
paragraph. 

Let V be any Hilbert space again, and let v be a V- valued function on A that 
satisfies the same conditions as before. Let A\, . . . , A n be finitely many pairwise- 
disjoint measurable subsets of X, and let a.\, . . . , a n be real or complex numbers, 
as appropriate. If / = Y^j=i a i^-A j is the corresponding simple function, then 
its integral with respect to v is given by 



(134.10) 




In this case, 



(134.11) 




Using standard arguments based on continuity and completeness, the integral 
of / with respect to v can be extended to an isometric linear mapping from 
L 2 '(X, |H| 2 ) into V. 

Suppose that V = L 2 (X, /z) for some nonnegative real- valued measure \i on 
X, and that v = v g for some g e L 2 (X, /i). If / is a measurable simple function 
on X, then it is easy to see that 

(134.12) [ fd" g = fg 
as an element of L 2 (X,/j,), and that 

(134.13) [ |jfdK|| 2 = [ \f\ 2 \g\ 2 d». 
Jx Jx 

If / G L 2 (X, \\v g \\ 2 ), then f g € L 2 (X,[i), and the same statements hold. 

135 Paths and orthogonality 

Let (V, (u, to)) be a real or complex Hilbert space, and let p(t) be a V- valued 
function on a closed interval [a, b] in the real line. Suppose that 

(135.1) (p(i 2 ) - p(h),p(t 3 ) - p(t 2 )> = 
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whenever a < t\ < t% < t$ < b, which implies that 

(135.2) (p(i a ) -p(ii),p(t 4 ) -p(* 3 )> = 

when < t4 < b too. More precisely, (|135.1|) also holds with £3 replaced by £4 
in this case, and (|135.2j) follows by expressing p{t&) — pitz) as the difference of 
p{ti) — p{h) and p(ts) — If we put 

(135.3) a(t) = ||p(t)-p(a)H 2 
for a < t < b, then 

(135.4) a(t) = |b(r)-p(o)|| 2 + ||K*)-P(»-)|| 2 

= a(r) + ||p(t) -p(r)|| 2 > a(r) 

when a < r < t < 6, so that a(t) is monotone increasing on [a, 6]. One can show 
that the one-sided limit p(t+) exists when a < t < b, and similarly that pit—) 
exists when a < t < b, in analogy with Section SS] Note that p(t) is continuous 
at the same points where a(t) is continuous, because of (I135.4[) . It is convenient 
to extend p(t) to the whole real line, by putting p(t) = p[a) when t < a and 
pit) — p(b) when t > b, so that p{a—) — p{a) and p(b+) = p(b) are defined as 
well. We can extend a{t) to R in the same way, so that a(t) = when t < a 
and a{t) = a(&) when t > b. 

As in Sections l44l and T1311 put 

(135.5) v((r,t))=p(t-)-p(t+) 
and 

(135.6) u{[r,t))=p(t-)-p(r-), u({r,t}) = p(t+) - p(r+) 
when a < r < t < b, and 

(135.7) v([r,t])=p(t+)-p{r-) 

when a < r < t < b. This determines a finitely-additive T^-valued measure on 
the algebra £ of subsets of [a, b] that can be expressed as the union of finitely 
many intervals, where the intervals may be open, closed, or half-open and half- 
closed. By hypothesis, 

(135.8) !/(/')) =0 

for every pair J, V of disjoint subintervals of [a, 6]. If fi a is the nonnegative 
Borel measure associated to ce(t) as in Section [44] then 

(135.9) HA)|| 2 = Mq (A) 

for every subinterval A of [a, b]. This also works when A £ £ , because A is then 
the union of finitely many pairwise-disjoint subintervals I\, . . . I n of [a, b], and 
v{I\), ■ ■ ■ , v{I n ) are orthogonal to each other in V. 
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Let 

n 

(135.10) /(*)=J>lz,(*) 

i=i 

be a step function on [a, b], where Ii, . . . , I n are pairwise-disjoint subintervals of 
[a, 6], and ci, . . . , c„ are real or complex numbers, as appropriate. The integral 
of / with respect to v can be defined by 



(135.11) 

In this case, 
(135.12) 



„b n 
Ja 3 = 1 



fdv 



EN 2 iik^oii 



because v(Ii), ■ ■ ■ , v(I n ) are orthogonal to each other in V. Hence 



(135.13) 



fdv 



= / IffdVa, 



as in (|135.9|) . Thus the integral of / with respect to v defines a linear isometry 
from the subspace of £ 2 ([a, b], fjt a ) consisting of step functions into V. This 
can be extended to a linear isometry from L 2 ([a, b], (j, a ) into V, by standard 
arguments of continuity and completeness. In particular, v can be extended to 
a V-valued Borel measure on [a, b] as in the previous section, by applying this 
extension to indicator functions of measurable subsets of [a, b]. 

If /i is a finite nonnegative Borel measure on [a, b], then p(t) = lut] defines 
a mapping from [a, b] into £ 2 ([a, b], /i) that satisfies the conditions mentioned at 
the beginning of the section. One could also use the indicator function associated 
to (a, t), [a,t), or (a, i], and the corresponding differences of one-sided limits of 
p would be the same. Note that these indicator functions are already the same 
in L 2 ([a, b], p) when p({x}) = for each x £ [a, b], in which case p is continuous. 
One can check that \i a = p in this situation, and that the embedding described 
in the preceding paragraph reduces to the identity mapping on L 2 ([a, &],//). 



136 Minkowski's integral inequality 

Let (X,A,p), (Y,B,l>) be measure spaces, with finite or cr-finite measure. If 
F(x,y) is a nonnegative measurable function on the Cartesian product X x Y 
and 1 < p < oo, then Minkowksi's integral inequality states that 



(136.1) 



x 



x 



F(x,y)dp(x)J dv{y)j 
( / F(x,yY dv{y)) l ' P dn(x). 
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This is an integrated version of the triangle inequality for the L p norm, which 
is also known as Minkowski's inequality. Note that one has equality in (| 136. 1 1) 
when p = 1, by Fubini's theorem. We have basically encountered versions of 
this already in connection with conditional expectation, and we would like to 
mention a couple of other approaches now. 

Let Ax, ... , A n be finitely many pairwise-disjoint measurable subsets of X 
whose union is equal to X. If F(x, y) is constant in x on each Aj, then (|136.1[) 
reduces to the ordinary Minkowski inequality for finite sums. Otherwise, one can 
get (|136.1I) by approximating F(x, y) by functions of this type. This is analogous 
to the earlier discussion of "nice functions" on X x Y, but with the roles of X 
and Y exchanged. A key point is that measurable subsets oi X x Y with finite 
measure can be approximated by finite unions of measurable rectangles, as in 
Section [TH 

Alternatively, put 

(136.2) N p (F)(x) = ( J F(x, yf d V {y)) VP , 
as before. If y, is a probability measure on X, then 

(136.3) (f F(x,y)df,(x)Y < f F{x,yf dfi(x) 

J X J X 

for each y £ Y , by Jensen's inequality. Hence 

(136.4) f (/ F(x,y)dn(x)Yd^y) < f f F{x,yf dfi(x) du{y) 

= [ N p (F)( X y d»(y), 
Jx 

by Fubini's theorem. If N p (F)(x) < 1 for /i-almost every x £ X, then it follows 
that 

(136.5) f (f F{x,y)d t i{x)) 1/P du{y) < 1. 

J Y J X 

This may be considered as a special case of (|136.1|) . and the general case may 
be derived from it using homogeneity, as follows. If the right side of (|136.1|) 
is equal to 0, then F(x,y) = almost everywhere on X x Y, the left side of 
(|136.1|) is also equal to 0, and there is nothing to do. There is also nothing to 
do when the right side of (|136.1I) is +oo. Thus we may suppose that the right 
side of (|136.ip is positive and finite, and we can even take it to be equal to 1, by 
multiplying F by a positive constant. We may also suppose that N p (F)(x) > 
for every id, because the x £ X for which N p (F)(x) = do not play a role 
in (|136.1| . If we put 

(136.6) F'(x,y)=N p (x)- 1 F(x,y), 

then N p (F')(x) = 1 for every x £ X automatically. Similary, if we put 

(136.7) M'(4)= / N p (F)(x) dfi(x), 

J A 
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then [i! is a probability measure on X, because the right side of (|136.1[) is 
supposed to be equal to 1. The special case of Minkowski's integral inequality 
under consideration implies that 

(136.8) f (f F'(x,y)dn'(x)y dv(y)<l. 

This implies that the left side of (|136.1I) is less than or equal to 1, as desired. 

Let N oa (F)(x) be the essential supremum of F{x, y) over y € Y. The p = oo 
version of (|136.1I) states that the essential supremum of 

(136.9) f F(x,y)dn(y) 

J x 

over y € Y is less than or equal to 

(136.10) / N oc (F)(x)d t i(x). 

J x 

If Y is a probability space, then this can be obtained from (|136.1[) by taking the 
limit as p —¥ oo with p € Z+, as in Section 11301 Otherwise, one can reduce to 
the case of probability spaces by approximating Y by subsets of finite measure, 
or using a positive weight on Y with integral 1. Alternatively, if N 00 (F)(x) < 1 
for almost every x £ X, then F(x, y) < 1 for almost every (x, y) G X x Y, 
by Fubini's theorem. If ji is a probability measure on X, then it follows that 
(|136.9j) is less than or equal to 1 for almost every y G Y. As in the previous 
paragraph, this may be considered as a special case of the desired estimate, and 
the general case can be derived from it in the same way as before. 



137 Spaces of measures 

Let (X, A) be a measurable space, and let (V, \\v\\ ) be a real or complex Banach 
space. Consider the space Ai(X, V) of V- valued functions /i on A such that 

oo 

(137.1) 5^11^(^)11 < oo 
and 

oo oo 

(137.2) £>(Aj) = /*( IMi) 

3=1 J=l 

for every sequence A\, A2, ... of pairwise-disjoint measurable subsets of X. As 
usual, the first condition already implies that MAj) converges in V . The 

second condition is equivalent to asking that /1 be finitely additive and have the 
continuity property that 

n 00 

(137.3) Um M (U^) = KLMi)> 

j=i 3=1 
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just as for real or complex measures. 

Note that AA(X, V) is a vector space over the real or complex numbers, as 
appropriate. If \i € Ai(X,V), then p(A) = \\p(A)\\ satisfies the conditions in 
Section 133 and ||/z||(A) = p*(A) is a finite nonnegative measure on X, as in 
Section [371 By construction, 

(137.4) MA)\\<M(A) 

for every measurable set A C X, and (A) is the smallest nonnegative measure 
on X with this property, as in Section[3S] It is easy to check that \\p,\\ (X) defines 
a norm on M(X, V). 

Suppose that Hi,p,2, ■ ■ ■ is a sequence of elements of M(X,V) which is a 
Cauchy sequence with respect to this norm. Thus for each e > there is an 
L > 1 such that 

(137.5) - fi n \\(X) < e 
for every l,n > L. Of course, 

(137.6) \\m(A) - iin(A)\\ < \\fii - fi n \\(A) < \\pi-p n \\(X) 

for every measurable set A C X and I, n > 1, which implies that {pi(A)}fZi is a 
Cauchy sequence in V for every A £ A. Let n{A) be the limit of this sequence in 
V, which converges by completeness. Note that {a*j(^)}^i actually converges 
to /j.(A) uniformly on A, because the Cauchy condition holds uniformly over 

AeA. 

If A\ , A2, . . . is a sequence of pairwise-disjoint measurable subsets of X, then 

00 00 00 

(137.7) ^||w(^)||<EllwlK^) = llwll(U^)- |lwll(X) 
i=l i=i i 1 

for each I. In the limit as I — » 00, we get that 

00 

(137.8) X>(4*)ll < ™p HwllW. 

The right side is finite because {pijf^-i is a Cauchy sequence, and hence is 
bounded. It is easy to see that fJ>(A) is finitely additive, since pi(A) is finitely 
additive for each The continuity condition (|137.3[) can also be derived from the 
corresponding property of the pi's, using the fact that {piiA)}^ converges to 
fJ-(A) uniformly on A. Similarly, if Ai, A2, ■ ■ . is a sequence of pairwise-disjoint 
measurable subsets of X whose union is equal to X , then 
00 00 

(137.9) llw(^) ~ < E Hw(^j) - Mnll(^) = ||W " W 
4=1 i=i 

for each l,n> 1, as before. This implies that 

00 

(137.10) ^||^)-M^-)II<SUP||W-/^IIW 

„— 1 Z>n 
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for each n > 1, by taking the limit as I — > oo, as in (I137.8|) . It follows that 
(137.11) ||M-Mn||W <sup|| W -Mn||W 



l>n 



for each n, by taking the supremum over all such partitions {Aj}J^ 1 of X. This 
shows that fi £ M(X, V) and that {^«}^Li converges to \i with respect to the 
norm ||/z||(X), and hence that Ai(X, V) is complete. 

138 Products and measures 

Let (X, A, [i), (Y, B, v) be finite or a-finite measure spaces, and let F(x, y) be a 
measurable function on X x Y . As usual, we put 

(138.1) N p (F)(x) = (J Y \F{x,y)\vdv{y)) 1/P 

when 1 < p < oo, and we let N 00 (F)(x) be the essential supremum of \F(x,y)\ 
over y £ Y. Suppose that 

(138.2) / N p (F)(x) dfi(x) < oo 
for some p, 1 < p < oo, and put 

(138.3) = / F(x,y)dfx(x) 

J A 

for each measurable set A £ X. This defines 4>(A) as a measurable function on 
Y which is in L P (Y) and satisfies 



(138.4) ll^)IUp(r) < / N p (F)(x)dfi(x) 



A 



by Minkowski's integral inequality. If Ai,A 2 ,... is a sequence of pairwise- 
disjoint measurable subsets of X, then 



OO OO p 

(138.5) ^||^)|| L p (y) < N p (F)(x)dn(x) 

N p (F)(x) dfi(x) < oo. 



j=l j=l J Aj 



Thus X)^=i 0(A?) converges in L P (Y), and it is easy to see that 

oo oo 

(138.6) 5>(^) = *(IK*)- 

Hence </> € A^(X, L p (y)). If ||0||(A) is as in the previous section, then 



.4 



(138.7) ||0||(A) < / N p (F)(x)dfi(x), 

because of (|138.4|) . 
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139 L p - Valued measures 



Let (X, .4) be a measurable space, and let (Y, B, v) be a <7-finite measure space. 
Suppose that p, € M(X, L P {Y)) for some p, 1 < p < oo. Thus ||/i|| is a finite 
nonnegative real measure on X, and we can consider the product measure \\fj,\ \ x v 
on X x F. We would like to represent /i by a function on X x F, as in the 
previous section. 

Let Ai, . . . , A n be finitely many pairwise-disjoint measurable subsets of X 
such that Uj=i -Aj = an( i ^ 9i(y)i ■ ■ ■ i9n{y) be elements of L q (Y), where 
1 < q < oo is the exponent conjugate to p, 1/p + 1/q = 1. Put G(x, y) = ,9j(y) 
when x £ Aj, and 



(139.1) 



L(G) 



E 



By Holder's inequality, 



(139.2) 



K A j)(y)9 3 (y) dv{y) 



< \\K A i)hp(Y) \\9j\\li(Y) < IImII(^) llftll^(r) 
for each j. This implies that 



(139.3) 



\L(G)\ < / N q (G)(x)dM 



x 



where N q (G)(x) denotes the L q (Y) norm of G(x, y) as a function of y, as usual. 
In particular, 

(139.4) \L(G)\ < \\v\\(X) 1/p IIGIU^xY-.lHIx*)- 

It is easy to see that (|139.1|) does not depend on the particular representation 
of G(x, y) in the preceding paragraph, because fi is finitely additive. One can 
also check that the collection of these functions G(x, y) forms a linear subspace 
of L q (X x Y, \\fi\\ x v), and that L{G) defines a linear functional on this subspace. 
The main point is that any two partitions of X into finitely many measurable 
sets has a common refinement, and so any two functions of this type can be 
represented in this way using the same partition of X. This subspace is also 
dense in L q (X x Y, x v), because q < oo. We also know from (|139.4|) 
that L(G) is a bounded linear functional on this subspace, with respect to the 
L q norm, and hence has a unique extension to a bounded linear functional on 
L q (X x Y, x v). 

The Riesz representation theorem implies that there is a unique element 
F(x,y) of LP(X x Y,\\fx\\ x v) such that 



(139.5) 



L(G) 



F(x, y) G(x, y) d||/x|| (x) dv(y) 



XxY 
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for every G £ L q (X x Y, \\fi\\ x v). If A is a measurable subset of X and 
g(y) £ L g (Y), then we can apply this to G(x,y) = 1a(x) g(y), to get that 

(139.6) J n(A)(y) g(y) dv{y) = J {J F(x, y) dfi(x)) g{y) dv{y). 
It follows that 

(139.7) »{A){y)= I F{x,y)dp(x) 

J A 

as elements of L P (Y) for every measurable set A C Y, as in the previous section. 
Moreover, 

(139.8) \\F\\ LP{XxYtMxv) < 

because of (|139.4[) . If p = oo, then this say that the L°° norm of F (x, y) is less 
than or equal to 1 on X x Y. Otherwise, if p < oo, and if A is a measurable 
subset of X, then (|139.3|) implies that 

(139.9) \L(G)\ < \\fA\(A) 1/p ||G|| L , (Ax y iWx , 
when G(x, y) = for every x £ X\A. Hence 

(139.10) ( I J \F(x, y)\P dn(x) d V {y)) ^ < ||/x|| (A) 1 / p , 



or equivalent ly, 
(139.11) / N p {F)(xf dn(x) < \\fA\(A). 



This shows that N p (F)(x) < 1 almost everywhere on X with respect to ||//||. 

140 ^-Valued measures 

Let (X, A) be a measurable space, and let nx, . . . be a sequence of real or 
complex- valued measures on X such that Yl^x I/Aj'K^O < oo. This implies that 

oo oo oo 

(140.1) J2 \h( a )\ < E N(^) ^ E < 00 

3=1 3=1 i =1 

for every measurable set ylCI, which means that p.(A) = {p,j(A)}°^ 1 £ I 1 for 
each A £ A. Put p(A) = Y^jLx IMj'K^Oi so that p is a finite nonnegative real 
measure on X by hypothesis, and 

oo 

(140.2) U{A)\\x = ^M)\<p{A) 

3=1 

for each A £ ^4. Using this, one can check that /i € -M(X, I 1 ), and that 
IHK-A) < p{A) for each iei. 
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This construction is actually equivalent to the one in Section [T551 with p = 1 
and Y = Z+, equipped with counting measure. This is because pj is absolutely 
continuous with respect to p for each j , and hence can be expressed in terms of 
an integrable function fj with respect to p, as in the Radon-Nikodym theorem. 
The L 1 norm of fj with respect to p is equal to for each j, and is 

summable over j. Thus the sequence of /j's can be identified with an integrable 
function on X x Z + , using p as the measure on X. 

Conversely, suppose that p £ A4(X, I 1 ). Thus p{A) = {pj{A)}^ =1 for some 
real or complex-valued functions pj on A, as appropriate. It is easy to see that 
Pj is a real or complex measure on X for each j, because of the corresponding 
properties of p. A key point now is that 

oo 

(140.3) J2\h\(A)<M(A) 

i=i 

for every A £ A. Of course, it suffices to show that 

n 

(140.4) £| Mj -|(A)<||m||(A) 

3=1 

for every A £ A and n > 1. Remember that |/Kj|(^4) = Pj(A) is defined as 
in Section [351 using Pj(A) = \pj(A)\. More precisely, Pj(A) can be defined as 
the supremum of sums of pj over partitions of A into finitely many measurable 
subsets. If we use the same partition of A for each j, then the desired estimate 
would follow from the definition of ||/z||(vl) as p*(A) with p(A) = \\p(A)\\ii. If 
instead we have different partitions of A for j = 1, . . . ,n, then we can use a 
common refinement of them to reduce to the case of a single partition of A. 

Suppose now that p £ A4(X, £ p ), 1 < p < oo. As in the preceding paragraph, 
p{A) = {pj(A)}°^ 1 , where each pj is a real or complex measure on X. It is easy 
to see that pj is absolutely continuous with respect to \\p\\ for each j, and so 
can be expressed in terms of an integrable function with respect to ||^t||, by the 
Radon- Nikodym theorem. If p = 1, then the L 1 norms of these functions are 
summable, as before. If p > 1, then we are back in the situation of the previous 
section, with Y = Z + equipped with counting measure. 

141 Finite sums 

Let (X,A) be a measurable space, and let (V, \\v\\) be a real or complex Banach 
space. Suppose that p±, . . . , //„ are finitely many real or complex measures on 
X, as appropriate, and that vi,...,v n are vectors in V. It is easy to see that 

a 

(141.1) p{A)=Y J N{A)v j 

3 = 1 
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defines an element of A4(X, V). Of course 
(141.2) 

for each A £ A, which implies that 
(141.3) 



\\KA)\\< E m a )\ hi ^ E N(^)H 

3=1 3=1 



/,||(A)<^|Mil(^)||^-| 

3=1 



Let p be a finite nonnegative real measure on X such that //j is absolutely 
continuous with respect to p for each j. One can take 



(141.4) 



P = E Iwl: 

3=1 



for instance. By the Radon-Nikodym theorem, there are integrable functions 
/i , . . . , f n on X with respect to p such that 



Vj( A ) = / fj d P 



(141.5) 



for each A £ A and j = 1, . . . ,n. If f(x) — Y^j=i fj( x ) v ji then 



(141.6) 



\KA)\\ 



for each A g .4, as in Section 11201 This implies that 



(141.7) 

for each A € A. 
More precisely, 

(141.8) 



N|(A)< / ll/n dp 

J A 



\\p\\{A)= \ H/ll dp 

J A 



for each A E A under these conditions. To see this, remember that 



(141.9) 



X>(A fc )|| < IWKA) 



fc=i 



when are pairwise-disjoint measurable sets whose union is A, by 

definition of ||/i||(A). In order to show that 



(141.10) 



\f\\d P < M(A), 
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one can choose measurable sets Af. on which the fj's are approximately constant. 

Let us now start with a measure fi £ M. (X, V) that takes values in a finite- 
dimensional linear subspace of V. If Ui, . . . , V n is a basis for this linear subspace, 
then there are unique real or complex measures fii, . . . , fj, n on X for which /i 
can be expressed as in (1141. ip . Because any two norms on a finite-dimensional 
real or complex vector space are equivalent, 



(141.11) 

for some c > and every t\ , . 
(141.12) c max 

1 <j < n 



l i v i 



> c max \tj\ 

l<j<n J 



. . , t n £ R or C, as appropriate. This implies that 

1^)1 < 11^)11 ^ii^ika) 



for each A £ A, and hence that pj is absolutely continuous with respect to ||/i|| 
for each j. Thus we can take p = in the previous paragraphs, and it follows 
that the corresponding function / satisfies ||/(x)|| = 1 for almost every x £ X 
with respect to \\fi\\. 



142 Approximations 

Let (X, A) be a measurable space, and let (V, ||u||) be a real or complex Banach 
space. Suppose that fii , ^2 > • • • is a sequence of elements of M (X, V) such that /in- 
takes values in a finite-dimensional linear subspace Vj of V for each j. Suppose 
also that {^j}^Li converges to /i £ M{X, V) with respect to the total variation 
norm, so that 

(142.1) lim \\nj -mII(^) = 0. 

Let p be a finite nonnegative real measure on X such that \\p,j\\ is absolutely 
continuous with respect to p for each j, such as 

00 

(142.2) p(A) = ^a i || Mj -||(A) 

i=i 

for some a, > with Y^jLi a j IIMjII(^) < 00 • Thus each can be expressed as 

(142.3) ^(A)= / / jC ?p 

J A 

for some 1/,-valued integrable function fj on X with respect to p, by applying 
the Radon-Nikodym theorem to the components of pi(A) with respect to a basis 
for Vj as in the previous section. More precisely, each fj is the sum of finitely 
many real or complex-valued integrable functions on X with respect to p times 
basis vectors of Vj , and the integral of fj over A is the sum of the integrals of 
the components of fj over A times the corresponding basis vectors of Vj . We 
also have that 

(142.4) f \\fj-f l \\dp= || Mi - w ||(X)^0 
Jx 

as j, I — > 00, because of (1142. ip . 
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143 Uniform convexity 



Let V be a vector space with a norm ||v||. It will be convenient to take V to be 
a real vector space here, but complex vector spaces can also be considered as 
real vector spaces, and so everything in this section works as well in that case. 
We say that V is uniformly convex if for every e > there is a 5 > such that 



(143.1) 

imply that 
(143.2) 



v,w G V, \\v\\ = \\w\\ = 1, and 



v + w 



> 1-6 



\v — w\\ < e. 



It is easy to see that inner product spaces are uniformly convex, because of 
the parallelogram law. It is well known that real and complex LP spaces are 
uniformly convex when 1 < p < oo. 

Suppose that v, w e V, \\v\\, \\w\\ < 1, and 



v + w 



(143.3) 

for some 6\ € (0, 1/2). In particular, 

\\v\\ + \\w\\ 



>l-5i 



1 

> 1 - 5i > -, 



( 143 - 4 ) 2 2' 

and so \\v\\, \\w\\ > 0. If v' = v/\\v\\, w' = w/||w||, then 
(143.5) \\v' - v\\ = (\\vl\- 1 - 1) = 1-|| 

and similarly for w. Thus 



(143.6) 



\\v> -v\\ + \\ W ' -HI _ 1 \\v\\ + \\w\\ 



which implies that 

v + w 



(143.7) 



and 
(143.8) 



< 



< 



v' 


+ 


w' 


\\v-v'\\ + \\i 


u-w'\ 




2 




2 




v 1 


+ 


w' 


+ <5i 




2 





v' + w' 



> l-2Ji. 



If 5i is sufficiently small, then 

(143.9) \\v' -w'W <e/2, 
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by uniform convexity. If also Si < e/4, then 

(143.10) \\v - w\\ < \\v' - w'\\ + \\v - v'\\ + \\w - w'\\ <e/2 + 2S 1 < e. 

This shows that uniform convexity implies the analogous condition in which 

IMI) IMI — i- 

Suppose that vi,...,v n € V, \\vj\\ < 1 for j = 1, . . . ,n, ti,...,t n arc non- 
negative real numbers, and that Y^j=i tj = 1- Let e > be given, and put 



(143.11) 



3 = 1 



Thus ||a|| < 1, and we would like to show that there is an rj > such that 
||a|| > 1 — rj implies that 



(143.12) 



^2tj\\vj - a|| < e, 



where rj does not depend on n, the Vj's, or the tj's. Let A be a bounded linear 
functional on V such that ||A||* = 1 and X(a) = \\a\\, the existence of which 
follows from the Hahn-Banach theorem, as usual. Hence 



(143.13) 

which implies that 
(143.14) 



^ tj X(vj) = X(a) = \\a\\ > 1 — rj, 

3 = 1 



S*i (l - («,•)) <»y. 

3 = 1 

Note that 1 — X(vj) > for each j, because |A(Uj)| < 1. In addition, 



(143.15) 



Vj + a 



> X 



'3+ a \ _ A(i>i) + ||a|| 



Let S2 be associated to e/2 as in the second version of uniform convexity. If 



X(vj) > 1 — 5 2 and rj < <5 2 , then 
(143.16) 



»,+a)/2||> Il^l + ii^) 



and so 

(143.17) ||u,- - a|| < e/2. 

Let Ii be the set of j = 1, . . . , n such that X(vj) > 1 — 5 2 , and let J 2 be the set 
of j = 1, . . . , n such that A(vj) < 1 — 5 2 . If r? < S 2 , then 



(143.18) 



^tj||vj-a|| <e/2, 

j'€Ji 
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by the preceding computation. Of course, \\vj — a\\ < \\vj\\ + \\a\\ < 2 for each j, 
and so 

(143.19) E*iH^-°H 2 E^ 
Using (|143.14|) . we get that 

(143.20) Yl *i ^ E *i ( J - A ^)) < 

i'6/a 3£h 

which implies that 

(143.21) Y, l i W v i - «ll < 2 E *J < 2S ? *■ 

j£l 2 j£l2 

Therefore 

n 

(143.22) ^ a|| = J2t j \\v j -a\\ + Y l ti\\vj-<>\\ 

j'=i j'e/i jeia 

< e/2 + 2<^~ 1 ?? < e 

when ?y is sufficiently small, as desired. 

144 Uniform convexity and measures 

Let (X,A) be a measurable space, and let (V, ||u||) be a uniformly convex Banach 
space. Also let e > be given, and let r] be as in the previous section. Suppose 
that jU G M(X, V) satisfies 

(144.1) ik^iixi-^imkx). 

Let ^ G M(X, V) be defined by 

(144.2) lio{A ) = /^L\\^(A), 

so that hq is the vector fj,(X)/\\fj,\\(X) times the nonnegative real measure ||/i|| 
on X. We would like to show that 

(144.3) ||M-Mo||(X)<e|| M ||(X) 

under these conditions. 

We may as well suppose also that ||/i|| (X) — 1, since otherwise we can divide 
A* by IImIIPO > 0- Let j4i, . . . , A n be finitely many pairwise disjoint measurable 
subsets of X such that X = Uj=i Aj j an< 4 ^ us check that 

n 

(144.4) £>(^)-A<o(^)||<e. 
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If = for some j, then fi(Aj) = Ho(Aj) = 0, and we can absorb Aj 

into one of the other A^s without affecting the sum. Thus we may as well ask 
that ||/x||(Aj) > for each j too. If we put 

(144.5) tj = WfiWiAj) and Vj - 



then Y^j=i tj — 1 an d ll^ill ^ 1 f° r each j, because < (Aj). Also, 

n n 

(144.6) Y / t j v j = Y i ^A j ) = n(X), 
and 

n n 

(144.7) £>(A,)- Mo (^)|| = ^>||^-mP0I|. 

Thus p44.4p reduces to (|143.12p . with a = /u(X). 

Now let /j, be any element of M. (X, V) , and let 9 be a small positive real 
number. By the definition of ||/x||(X), there are finitely many pairwise-disjoint 
measurable sets X\, . , . , X r such that X = U/=i Xi and 



(144.8) M(X)<Y,MXl)\\+9. 

1=1 

Of course, ||mII PO = Ya=i \\^\\( x i), and so 

r 

(144.9) ^(IHKXO-llMWUXe. 

z=l 

Each term in the sum is nonnegative, since ||/x(X;)| < |j/i||(X;). If L2 is the set 
of/ = 1 , . . . , r such that 

(144.10) \\fjL(Xi)\\ < (l-77)||/i||(XO, 
where 7/ > is as before, then it follows that 

(144.11) r,J2M(Xi)< J2(M( X i)-\\KXi)\\)<0- 

l£L 2 leL-2 

Let L\ be the set of / = l,...,r such that ||/i(Xi)| > (1 — 77) ||/j||(Xi), and 
for each / € L\, let \n £ -M(Jf, V) be defined by 

(144.12) ^ {A) = ^A_ M{Ar]Xl y 

This is analogous to (1144.21) , applied to the restriction of /j, to Xi , and it follows 
from the earlier discussion that 

(144.13) ||/i- W ||(J5Q)<e||M||pQ) 
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for each I £ L\. Combining this with the earlier estimate (1144.111) for L2, we 
get that 



(144.14) 



Remember that rj depends on e, while 9 can be chosen independently of e, 77. 
Thus the right side can be made arbitrarily small, by choosing e and then 9 
appropriately. 

145 Uniform convexity and paths 

Let (V, \\v\\) be a uniformly convex Banach space, and let / : [x, y] — > V be a 
path of finite length A*. Also let e > be given, and let 77 = 77(e) be as in 
Section [T32] Suppose that 

(145.1) ||/(a:)-/(y)||>(l-T,)A». 
Put 



/(jH(i) 

AI 

where AJ is the length of / on [a:, z], x < z < y. We would like to show that 



(145.2) fn{ z ) — .1/ 



(145.3) the length of / - / on [at, y] is <eA». 

This is basically the same as the argument for measures in the previous section. 
As before, we may as well suppose that AJJ = 1, since otherwise we can divide 
/byAg. 

If {rj}™_ is any partition of [x, y], then we would like to show that 

n 

(145.4) IK/^) - /ofa)) - - /ofa-i))!! 
3=1 

= E H(/( r i) - /fa-i)) - (/o^) - /ofa-i))|| < e. 

We may as well ask that the length K r r 3 j _ 1 of / on [j"j_i,r,-] be positive for each 
j = l,...,n, since otherwise /, /q are constant on [rj_i, 7j-], and 7*j or rj_i 
could be removed from the partition without affecting the sum. Put 

(145.5) tj = Ap! and ^ = /(rj) ~ /(rj ~ x) 



so that Yj1=i tj ~ 1 an£ l \\ v j\\ — 1 l° r eacn ii because ||/(r 3 -) — /(r 3 _i)|| < A^^. 
Moreover, 

n n 

(145.6) v i = £(/fa) - A r i-i)) = /(») - 
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and 

n 

(145.7) E IK/fo) - - (/ofa) - /ofo-r))!! 
3=1 

= E*jll»i-(/(i/)-/(*))l|. 

3=1 

Thus (|145.4|) follows from (|143.12|) , with a = /(y) - /(a;). 

Now let / : [a, b] — > V be a path of finite length A', and let 6 be a small 
positive real number. By the definition of A b a , there is a partition {x;}[ =0 of 
[a, 6] such that 

r 

(145.8) A*<Ell/fo)-/fo-OII+0. 

3=1 

This implies that 

r 

(145.9) E( A -U- ll/fa) -/fo-i)||)<0, 

because A^ = X)I=i -A.^J_ . Note that the terms in the sum are nonnegative, 
since ||/(xj) — f(xi-i)\\ < A%'_ . If L 2 is the set of I = 1, . . . ,r such that 

(145.10) \\f(x l )-f(x l - 1 )\\<(l- V )A*\_ i , 
where 77 > is as before, then 

(145.11) r? E < E ( A ^ - H/^) - /(*»-OII) < 

Let Li be the set of / = 1, . . . , r such that 

(145.12) \\f{ Xl )- f{x l - l )\\>{l- V )kt\-\ 
If I G L2, the define fi : [a, b] — !• V by 

(145.13) Mz) = iM^Et^ K _ i 

when xj_i < z < a;;, and put f{z) — when « < xi-i, f(z) = f(xi) — /(xj_i) 
when z > x\. This is the same as (|145.2[) on [arx—i, arj] with x — xi-i, y = xj. 
As in (|145.3I) . the length of / — /; on [x;_i,xj] is less than or equal to eAijjj . 
Combining this with (|145.11[) , we get that the length of / — X^eLi fi on l a > ^] 
is less than or equal to 

(145.14) ^TeA^+Tr^eA^ + rT 1 ^ 
2eLi 
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This uses the fact that the length of a path on [a, b] is the sums of the lengths of 
its restrictions to the intervals [xz-i, xi], 1 < I < r. lfl£Lx, then fj is constant 
on [xi-i,xi] when j ^ I, by construction, and so the length of / — J2jeL ± fj * s 
the same as the length of / — /; on this interval. Similarly, if Z £ L%, then fj 
is constant on [a;j_i,a;j] for each j £ L\, and the length of / — X^eLi /j ^ s tne 
same as the length of / on this interval. It follows from this estimate that the 
length of / — J2jeL! fj can be ma de arbitrarily small, first by choosing e to be 
very small, and then choosing 9 to be sufficiently small, depending on 77, which 
also depends on e. 



146 Uniform convexity and martingales 

Let (V, \\v\\) be a uniformly convex Banach space. Also let e > be given, and 
let 77 > be as in Section 11431 We may as well ask that 77 < e too, which is 
practically unavoidable anyway. 

Let {X, A, fi) be a probability space, and let V\,Vi,... be a sequence of 
partitions of X into finitely or countably many pairwise disjoint measurable 
subsets of positive measure such that Vj+i is a refinement of Vj for each j. 
As usual, the arguments that follows are a bit simpler when each Vj has only 
finitely many elements, but countable partitions and other situations can be 
accommodated as well. Let Bj = BiJ^j) be the cr-algebra of measurable subsets 
of X generated by Vj, as in Section [771 so that Bj C Bj + \ for each j. 

We would like to consider l/-valued martingales on X with respect to this 
filtration, as in Section 11031 Remember that a ^-valued function fj on X is 
measurable with respect to Bj if and only if it is constant on the elements of Vj ■ 
Suppose that we have a sequence {fj}j^ 1 of V- valued functions on X such that 
fj is measurable with respect to Bj for each j and has bounded L 1 norm. 
Suppose also that {fj}J^i is a martingale with respect to the Bj's, so that the 
value of fj on B £ Vj is equal to the average of the values of fj + i on the sets 
A £ V j+ i with AC B. 

Under these conditions, {||/j||}^Li is a submartingale on X with respect to 
the Bj's. In particular, the L 1 norm of ||/j|| is monotone increasing in j, and so 

(146.1) lim / ||/ i ||d/i = sup / \\fj\\d[i. 
^°°Jx j>iJx 

Let 9 be a small positive real number, and suppose that 

(146.2) sup f \\f n \\dn< f \\fj\\dfx + 0. 

n>l J X JX 

Note that 

(146.3) / \\f n+1 \\dfi> f \\f n \\dfi 

Jb Jb 

when B £ Vj and n > j, because {||/n||}^Li is a submartingale. Moreover, 

(146.4) lim / \\f n \\dfi = lim V / ||/„||d/x 

JX B( - V .JB 
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= Km f ||/ n ||d M . 

This is obvious when Vj has only finitely many elements, and otherwise one can 
use the monotone convergence theorem for sums. It follows that 

(146.5) V ( lim f \\f n \\dfi- f \\fj\\dn) <e, 



Bev 



where each term in the sum is nonnegative. 
Let V'j be the set of B e Vj such that 



(146.6) / ||/,-|| dfi > (1 - r?) lim / ||/ n || dp. 
Jb 

Thus V" = Vj\Vj consists of B G Pj such that 

(146.7) / ||/,||^< (l-T?) lim / \\f n \\dfx, 
Jb n ^°°JB 

and satisfies 

(146.8) r, V lim / ||/ n || dp < 0, 
by (IT463)) . 

Let / n (-i4) be the value of /„ on 4 £ P„, as in Section [TU51 Thus 

(146.9) fj{B)= J2 11 



ACS 



when _B € and n > j, because {/n}5£Li is a martingale. In addition, 

(146.10) / \\f J \\dn=\\f j (B)\\KB) 

Jb 

and 

(146.11) f ||/„||dp = ^ ||/„(A)||/x(A). 



Ae?„ 

ACS 



Note that 

(146.12) f ||/ n || d/i> / ||/ J ||d M >0 
when _B 6 Pj and n> j, and put 

(146.13) ^(A) = fi(A) ( / ||/ n || dp)"* 
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for each A £ V n with A C B, so that 
(146.14) = 



ACB 



by construction. Also put v n (A) = /„(A)/||/„(A)|| when A £ V n and/„(A) 7^ 0, 
and v n (A) = when f n (A) = 0, so that 



ACB 



(146.15) ^ Vn(A)t n (A) = ^ /„(A)m(A) 

ACB 

= f i {B)n{B){ f II./;, ,/,/ 



/nil <^ 



It follows that 
(146.16) 



ACB 



> 1 - T) 



when B £ V- and n > j. 

This is exactly the situation discussed in Section 11431 except that the sum 
in (|146.16[) may have infinitely many terms, which can be handled in the same 
way as before. If 



(146.17) a j , n (B)=f j {B)n{B)[ / ||/ n ||d M ) 

then we get that 
(146.18) 



53 \K{ A ) ~ Hn(B)\\t n (A) <e 



A£-P n 
ACB 



when B £ Vi and n > j. Put ctj(B) = fj[B)/\\fj(B)\\, which is the same as 
dj t j (B) , and observe that 



(146.19) 



: n (B)= aj (B) 



Jb ll/nll <V 

This implies that 

(146.20) \\ aj (B) - aj, n (B)\\ < r) 

when B £ Vj and n > j. Combining this with (1146.181) , we get that 



(146.21) 



\\v n {A)- aj (B)\\t n (A)<e + r)<2e 



AeV n 
ACB 



when B £ V\ and n > j. 

3 — J 

Equivalently, 



(146.22) \\vn(A)-a j (B)\\\\f n (A)\\fx(A)<2e f \\f n \\dn 

ACB 
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when B G V'j and n > j, which reduces to 

(146.23) \\fn(A)-a j (B)\\f n (A)\\\\fx(A)<2e f ||/ n ||d M , 



ACB 



using the definition of v n (A). The sum on the left can be expressed as in integral, 
so that 

(146.24) f ||/ n - a 3 -(B) || d/i <2e / ||/„||d/i 

J B JB 

when B e V'j and n > j. Put a,j(B) = when B £ V'j', and let dj(x) be the 
V- valued function on X equal to a,j(B) when x € B 6 "P. Summing the previous 
estimate over B &V'j, and using (I146.8|) for _B e P", we get that 



(146.25) / ||/ n -a J ||/„||||d/i<2e / \\ f n \\ dfi + V - 1 6 
Jx Jx 

when n > j. 

As usual, the right side of (|146.25l) can be made arbitrarily small, by first 
choosing e to be as small as one likes, and then choosing 9 depending on r/, 
which depends on e. This works uniformly over n > j, because the L 1 norm 
of ||/ n || is bounded, by hypothesis. Because {||/n||}^=i is a submartingale on 
X with bounded integral, there is a real-valued martingale {gn}%Li on X such 
that ||/ n || < g n and 

(146.26) / g n dfi= lim f \\fi\\dn, 

Jx i^^Jx 

for each n, as in Section |9l] Of course, the integral of g n over X is independent 
of n, because of the martingale condition. In particular, 

(146.27) f (g n -\\f n \\)dn<0 

Jx 

when n > j, by (|146.2I) and the monotonicity of the integral of ||/ n ||. Using 
(|146.25p . we get that 

(146.28) / \\f n -a J g n \\d f i<2e [ ||/„|| dfi + (rf 1 + 1) B 
Jx Jx 

when n > j, since ||aj(a;)|| < 1 for every x G X, by construction. Note that 
{aj gn}^ = j is a 1^-valued martingale on X, because {<7n}5^Li is a martingale on 
X and aj is constant on the elements of Vj . 



147 Strict convexity 

Let V be a real vector space with a norm ||u||. As before, a complex vector 
space is automatically a real vector space too, and so everything in this section 
can be used in that case as well. The closed unit ball 

(147.1) B 1= {veV : |H| < 1} 
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in V is said to be strictly convex if for every v,w £ B\ with v ^ w and every 
feR with < t < 1 we have that 



(147.2) \\tv+ {l-t)w\\ < 1. 

Of course, (|147.2I) holds automatically when ||u|| < 1 or ||w|| < 1, and so it 
suffices to check this when \\v\\ = \\w\\ = 1. One can show that the unit ball in 
an inner product space is strictly convex by determining when equality occurs in 
the Cauchy-Schwarz inequality. The unit ball in an LP space is strictly convex 
when 1 < p < cxi, because of the strict convexity of the function \x\ p on the real 
line. This is similar to the proof of the convexity of the unit ball in LP using 
the convexity of |a;| p , as in Section[S] Note that B\ is strictly convex when V is 
uniformly convex. 

Let A be a nonzero bounded linear functional on V, and suppose that v, w 
are vectors in V such that ||u|| = \\w\\ — 1 and X(v) — X(w) = ||A||*. Thus 

(147.3) X(t v + (l-t)w) = tX(v) + (1 - t) A(w) = HAH* 
when < t < 1, and hence 

(147.4) ||A||, = \X(tv + (l-t)w)\ < ||A||» \\tv + {l-t)w\\, 
which implies that 

(147.5) \\tv + (l-t)w\\ > 1. 

By the triangle inequality, \\t v + (1 — t) w\\ < 1 when < t < 1, and so 

(147.6) \\tv+(l-t)w\\ = 1. 

If B\ is strictly convex, then it follows that v = w under these conditions. 
Conversely, let us check that this property characterizes strict convexity of B\ . 

Suppose that v, w £ V, \\v\\ = \\w\\ = 1, < t < 1, and that a = tv+(l — t) w 
satisfies j|a|| = 1. As usual, there is a bounded linear functional A on V such 
that A(a) = ||A||» = 1, because of the Hahn-Banach theorem. This implies that 
|A(u)|,|A(io)| < 1 and 

(147.7) 1 = X(tv + (l-t)w) =tX{v) + (1 - t) X(w), 

so that X(v) = X(w) = 1. If we have the uniqueness property described in the 
previous paragraph, then we get that v = w, which means that B\ is strictly 
convex. 

If V is not uniformly convex, then there is an e > and sequences of vectors 
{ v j}JLi, { w j}jLi m V such that \\vj\\ = \\vjj\\ = 1 and \\vj — Wj\\ > e for each 
j, and 

Vj + Wj 



(147.8) lim 

J'-Voc 



1. 



If V has finite dimension n, then there is a one-to-one linear mapping from R" 
onto V. This mapping is also a homeomorphism with respect to the standard 
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topology on R n and the topology on V determined by the metric associated 
to the norm. In particular, closed and bounded subsets of V are compact in 
this case. Thus we may suppose in addition that {vj}°^L 1: {wj}°^L 1 converge to 
some vectors v, w € V , respectively, by passing to subsequences. By hypothesis, 
||u|| = || || = 1, \\v — w\\ > e > 0, and \\(v + w)/2\\ — 1, which is impossible 
when Bi is strictly convex. This shows that V is uniformly convex when V is 
finite-dimensional and B\ is strictly convex. 
Suppose that B\ is strictly convex, and that 

(147.9) ll« + HI = IHI + IHI 

for some v, w £ V with »,i))/0. If 

(147.10) v 1 = — , w 1 = — -, and t 



\v\\ ' |u>| ||u|| + || -tf || 

then 1 - t = \\w\\/(\\v\\ + \\w\\) and 

v + w 



(147.11) tv' + (l-t)w' 



\v\\ + w 



This has norm 1 by hypothesis, so that v' — w' by strict convexity. Equivalently, 
w = rv, where r = \\w\\/\\v\\. 

Let (X, A) be a measurable space, and suppose that \i € A4 (X, V). If A C X 
is measurable, then n(X) = fi(A) + fi(X\A), which implies that 

(147.12) y(X)\\ < MA)\\+MX\A)\\ 

< M(A) + M(x\A) = M(x). 

If ||/x(^)ll = then it follows that 

(147.13) ||M*)IIHIM^)II + IIM*V4)II 

and 

(147.14) MA)\\ = M(A) 

for every measurable set A<Z X. If ||/x||(X) > and B\ is strictly convex, then 
one can argue as in the preceding paragraph to get that 

(147.15) ,(A)= K X)^M- ) 

for every measurable set A C X. 

Suppose now that / : [a, b] — > V is a path of finite length, and let A| be the 
length of the restriction of / to [x, y] C [a, b]. Thus 

(147.16) ||/(6) -/(a)|| < \\f(x)-f(a)\\ + \\f(b)-f(x)\\ 

< a* + a! = a^ 
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when a < x < b. If - /(a)|| = A£, then it follows that 
(147.17) - f(a)\\ = \\f(x) - f(a)\\ + \\f(b) f(x)\\ 



and 

(147.18) 



11/(^-/(0)11= a; 



when a < x < b. If > and B\ is strictly convex, then one can argue as 
before to get that 

(147.19) /(a;) _ /(a) = (/(6 )_ / ( a )) A | 



A b 



when a < x < b. 



148 Minimizing distances 

Let be a uniformly convex Banach space, and let E be a nonempty 

closed convex set in V. Also let v G V be given, and let p be the distance from 
v to E, 

(148.1) p = M{\\v-w\\:w£E}. 

Let {wj}j° =1 be a sequence of elements of E such that 



(148.2) 



lim \\v — Wj 



Because E is convex, (wj + wi)/2 £ E for every j, I > 1, and so 



(148.3) 



v 



Wj + Wl 



>P- 



Suppose that v $ E, so that p > 0, and put 

v — un 



(148.4) 



for each j. Thus \\uj\\ — 1 for each j, and hence \\(uj + ui)/2\\ < 1 for every 
j, I > 1, by the triangle inequality. Using (j!48.2[> and (|148.3p . it is easy to see 
that 

(148.5) U] + Ul 



lim 

j>i->-oo 



1. 



This implies that 
(148.6) 



lim \\ Uj 

3,l-*oo 



Ul\ 



0. 



because of uniform convexity. Using (I148.2|) again, it is easy to check that 

(148.7) \\wj - wi\\ —\\(v — Wj) - (v - —> as j, I -> oo. 

This shows that {wj}^ =1 is a Cauchy sequence, which therefore converges to 
some w € V. We also have that w € E, because E is closed. Of course, 
\\v — w\\ = p, so that w minimizes the distance to v from elements of E. 
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Suppose that w' is another element of E such that \\v — w'\\ = p. If < t < 1, 
then t w + (1 — t) w' € E, because E is convex, and so 



(148.8) 



Ik 



(tw + (1 -t)w')\\ > p. 



Moreover, 



(148.9) || i; 



(tw + (l-t)w')\\ <t\\v-w\\ + (1 



t) \\v - w'\\ = p, 




(148.11) 



\\tu+(l-t)u'\\ = 1 



when < t < 1. Strict convexity of the closed unit ball in V implies that u = vl , 
which is the same as saying that w = w' . 

Let A be a nonzero bounded linear functional on V, and let E be the set of 
w 6 V such that X(w) = ||A||*. This is a closed affine subspace of V, which is 
convex in particular. The distance p from E to is the same as the infimum 
of 1 1 if 1 1 over w € E, which is equal to 1 in this case, by the definition of the 
dual norm of A. The arguments in the previous paragraphs imply that there is 
a unique w E E such that ||u>|| = 1. This shows that the supremum is attained 
in the definition of the dual norm of a bounded linear functional on a uniformly 
convex Banach space. 

149 Another approximation argument 

Let V\ be a real vector space with a norm ||i>||. As usual, everything in this 
section can also be applied to complex vector spaces, since they are real vector 
spaces too. Suppose that V\ is uniformly convex, so that for each e > there is 
a 8(e) > such that for every v,w E Vi with \\v\\ = \\w\\ = 1 and 



we have that \\v — w\\ < e, as in Section ll43l Although uniform convexity follows 
from strict convexity of the unit ball in finite dimensions, as in Section [147[ the 
estimates in this section will only depend on 5(e), and not on the particular 
norm ||t?||, or the dimension of V±. Hence these estimates hold uniformly over all 
finite-dimensional subspaces of a uniformly convex Banach space, for instance. 

Let (X, A, fj,) be a probability space, and let B be a cr-subalgebra of A. As 
in Section 11201 it is easy to deal with integrals of V\ -valued functions on X, 
by integrating the components of these functions with respect to a basis for V± . 
Similarly, the conditional expectation of a T^-valued function on X with respect 
to B can be defined by taking the conditional expectation of the components of 



(149.1) 



v + w 



> i - m 



2 
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the function with respect to a basis. It is easy to sec that this does not depend 
on the choice of a basis for V\ , using the linearity of integration and conditional 
expectation. 

Let / be an integrable Vi-valued function on X with respect to fi, which 
means that the components of / with respect to a basis are integrable real- 
valued functions. Also let f B = E(f \ B) be the conditional expectation of / 
with respect to B, as usual. Remember that 

(149.2) \\M < E(\\f\\ | B) 
almost everywhere on X, as in Section IT201 so that 

(149.3) f \\f B \\d(x< f ll/H d/i, 

Jx Jx 

in particular. Let 9 be a small positive real number, and suppose that 

(149.4) f H/ll d»< f \\f B \\dn + 9. 

Jx Jx 

This implies that 

(149.5) / (E{\\f\\\B)-\\f B \\)dn<B, 
Jx 

because the integrals of ||/|| and -B(||/|| | B) over X are the same, since X e B. 
Let rj be another small positive real number, and put 

(149.6) X 1 = {xeX: \\f B (x)\\ > (1 - V ) E(\\f\\ \B)(x)}, 

(149.7) X 2 = {x£X:\\f B {x)\\<{l-r,)E{\\f\\\B){x)}. 

Thus X\,X 2 G B, because ||/b||, S(||/|| | B) are measurable with respect to B. 
Note that 

(149.8) v [ H/ll dfi = f r]E{\\f\\ | B)dfi 

< [ (E(\\f\\\B)-\\f B \\)d(x<e, 
Jx 

where we use the fact that X 2 £ B in the first step, and (|149.2I) and the definition 
of X 2 in the second step. 

In order to see what happens on X\, it will be convenient to use linear 
functionals on V\. Of course, every linear functional on V\ is bounded, because 
V\ has finite dimension, and the dual V{ of V\ has finite dimension equal to the 
dimension of V\. In particular, there is a sequence of linear functionals {Aj}"^ 
on V\ such that ||A_y||* = 1 for each j and the A/s are dense in the set of A 6 V* 
with ||A||* = 1. As usual, for each v € V\ there is a A e V* such that ||A||* = 1 
and X(v) = \\v\\, because of the Hahn-Banach theorem. This implies that 

(149.9) |H| =supAj(u) 
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for each v € Vi, by approximating A by Aj's, and using the fact that ||Aj||* = 1 
for each j. 
Put 

(149.10) A^ixeX: X 3 (f B (x)) > (1 - rf) E{\\f\\ \ B)(x)} 
for each j > 1, so that Aj C Xi and Aj € # for each j, and 

oo 

(149.11) \jA j =X 1 , 

by (|149.9|) . It is better to have disjoint sets, and so we let B\ — A\ and 
B n = An\(\JjZi A i) when n>2. Thus B n C A„ C X x and B„ e B for each 
n, £?; fl B n — when I < n, and 

oo oo 

(149.12) (J B n = (J A,- = X u 

as before. Note that A o f B = E(X o f \ B) for each linear functional A on V\. 
This implies that 

(149.13) / \ n of B dn= j \ n ofd[x, 
since B n £ B, while 

(149.14) f E(\\f\\\B)dii= [ H/ll <i/i. 

J B„ J B„ 

Because B n C A n , 

(149.15) / \ n °f B dv> (l-v) f E(]\f\\\B)dn 

JB n J Bn 

when fi(B n ) > 0, and hence 

(149.16) / \ n afdfx> (1-n) [ H/ll dp. 



B„ 



Equivalently, 

(149.17) / (H/ll -\ n of)dti<r) I H/ll dfi 



when/i(£?„) > 0, where the integrand on the left is nonnegative, since ||A ri ||» = 1. 
Let e > be given, and put S = (5(e). Also put 

(149.18) B nA = {x e B n : A n (/(x)) > (1 - 6) \\f(x)\\}, 

(149.19) B n>2 = {x e B„ : A n (/(x)) < (1 - S) \\f(x)\\}. 

Thus 

(149.20) sf \\f\\dn< f (\\f\\-X n of)dfi<v[ ll/ll dfi 
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when fi(B n ) > 0, by (|149.17[) . As before, we shall be interested in 77's that are 
small compared to S, so that the integral of ||/|| over B n ^ is relatively small. 

If ieli, then Jb{x) 7^ 0, and we put a(x) = /e(a;)/||/e(a;)||. Otherwise, if 
x G X2, then we put a(x) = 0. If x G B n C A n C X\, then 

(149.21) A„(a(x)) > 1 -77, 

using also (|149.2j) . If x G B n>1 , then f(x) ^ 0, and we put b(x) = 
Note that 

(149.22) A„(6(x)) > l-<5, 

by definition of B n ^. Thus ||a(x)|| = ||&(x)|| = 1 and 



(149.23) 



a(x) + b(x) 



x n ( aix)+ 2 b{x) )>i- s -±^>i 



when x G B n \ and ?y < 5. This implies that ||a(x) — b(x)\\ < e, because of 
uniform convexity. Equivalently, 

(149.24) \\f(x)-a(x)\\f(x)\\\\<e\\f(x)\\ 

when x G B n \ and 77 < S. 
It follows that 



(149.25) / ||/(aj)-o(x)||/(x)||||d/i< / e\\f\\d(jL + 2 1|/|| dp 
when 77 < <5(e), and hence 

(149.26) / \\f(x)-a(x)\\f(x)\\\\d^<(e + 2 6(e)- 1 r ] ) [ ||/|| dfx, 



because of (|149.20|) . This also holds trivially when 77 > S(e), since the coefficient 
on the right would be greater than 2. Summing over n, we get that 

(149.27) f \\f(x)~a(x)\\f(x)\\\\dti<(e + 2 5(e)- 1 r 1 ) [ \\f\\dfi. 
JXi J x x 

Combining this with (|149.8I) . we obtain 



(149.28) / \\f{x)-a(x)\\f{x)\\\\dpi< (e + 2 6(e)- 1 V ) / ||/|| d/i + rf 1 9. 
Jx J x 

Alternatively, one might prefer to take a(x) = /b(x)/||/b(x)|| for every x in 
X such that /s(x) ^ 0, even when x G X2. This would ensure that a(x) does 
not depend on / even indirectly, through the definition of X2 . In this case, we 
would get that 

(149.29) / \\f{x)~a(x)\\f{x)\\\\d l i<{e + 2 5{6)- 1 vi) [ ||/|| dfi + 2 77- 1 0, 
Jx Jx 

which is to say that we would multiply r\~ x 6 by 2 in the previous estimate. In 
both situations, a(x) is measurable with respect to B, because /g is measurable 
with respect to B and X2 G B. 
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150 Examples in £ p 



Let oi, CL2, ■ ■ ■ be a sequence of real or complex numbers, and consider 

n 

(150.1) f n (x) =^2a j r j (x)S j . 

i=i 

Here r\{x), 7-2(21), . . . are the Rademacher functions, and 5j = is the 

sequence defined by Sjj = 1 when j = I and Sjj = when j ^ I. Thus 
{/n}nLi is a martingale on the dyadic unit interval with respect to the usual 
filtration associated to dyadic subintervals, and with values in the vector space 
of sequences of real or complex numbers, as appropriate. In particular, {f n }^Li 
is a martingale with values in £ p for each p, 1 < p < 00. Note that the £ p norm 
of f n {x) is equal to the £ p norm of the finite sequence ai, . . . , a n for each x and 
n. Hence the L 1 norm of ||/ n (a;)||^p is equal to the l v norm of a\, . . . , a n for each 
n. It follows that the L 1 norm of ||/n(^)|U* is uniformly bounded over n if and 
only if {aj}J!L 1 is in l v . If {a,j}°Z 1 € l v and p < 00, then it is easy to see that 
f n (x) converges in £ p as n — ¥ 00 for each x. Similarly, if {aj\^L 1 converges to 
0, then f n (%) converges in Co equipped with the £°° norm as n — > 00 for each x. 
If {aj}j*Li is bounded, then f n {x) is uniformly bounded in £°°, but it does not 
converge in the £°° norm as n — > 00 for any x unless {aj}'^L 1 converges to 0. 



151 Uniform convergence 



Let (V, ||i>||) be a real or complex Banach space, and let V\, v%, . . . be a sequence 
of elements of V. As in Section [SD1 let X be the set of sequences x = {xj}°°^ 1 
with Xj = 1 or — 1 for each j, which is the same as the Cartesian product of a 
sequence of copies of {1,— 1}. Consider 



(151.1) 



fn{x) 



3 = 1 



for each positive integer n and x £ X. This is basically the same as the sequence 
of functions considered in the previous section when V — £ p and Vj 



aj Sj , since 



rj(x) = Xj is another version of the Rademacher functions. Let us check that 
{/n}J£Li converges uniformly on X when X^ez + v i converges in the generalized 
sense, as in Section Q31 In particular, {f n }^Li converges uniformly on X when 



= 1 u 3 



converges absolutely. In this case, it is very easy to show directly that 



{/nlnLi converges uniformly, by the same argument as in Weierstrass' M-test. 



Suppose that 



jGZ + v 3 



converges in the generalized sense, which implies 
that it satisfies the generalized Cauchy criterion, as in Section [14] This means 
that for each e > there is a finite set A e C Z + such that 



(151.2) 



E 

3'eB 



< e 
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for every finite set B C Z + with A c n B = 0. Let L e be the maximum of the 
elements of A e , with I/ e = when A e = 0. If n > /, then 



(151.3) f n (x) - fi(x) = Y x j v i 

3 =l + l 



Y v o- Y v *> 



j€B- 



where B\ t . B l are the sets of positive integers j such that I < j < n and 
Xj = 1 or — 1, respectively. If I > L e , then Bf n A e — B7 fli £ =0, and so 



(151.4) \\Ux)-fi{x)\\< 



< e + e = 2e. 



This shows that {f n }^Li is a Cauchy sequence with respect to the supremum 
norm on the space of V- valued functions on X. It follows that {f n }^Li converges 
uniformly on X, because V is complete. As usual, one can observe first that 
{fn{x)}%Li is a Cauchy sequence in V for each x £ X, which converges because 
of completeness, and then check that {f n }^Li converges uniformly on X to the 
pointwise limit, because of the uniform version of the Cauchy condition. 

Conversely, suppose that {f n }^Li converges uniformly on X, and hence 
satisfies the uniform version of the Cauchy condition. This means that for each 
e > there is an N e > such that 

(151.5) \\f n (x) - Mx)\\ < e 

for every n > I > N e and x £ X, or equivalently 



(151.6) 



E' 

3=1 + 1 



< e 



for every n > I > N c and x € X. Let B C Z + be a nonempty finite set whose 
minimal element is greater than N e . If y, z € X are defined by y 3 - = 1 for every 
j, Zj — 1 when j £ B, and Zj = — 1 otherwise, then 



(151.7) 



E yi v 3 

j=N c + l 



E 



2 E^- 

j£B 



j=N e +l 

when the maximal element of B is less than or equal to n. Hence 



(151.8) 2 




< 


n 

E Vi v i 


+ 


n 

E Z 3 V 3 


<e + e = 2e 








j=N e +l 




j=N e + l 





by (|151.6p . This is the same as saying that 



< e when B C Z + 



satisfies 



is a finite set disjoint from {1, . . . ,N £ }, which implies that 
the generalized Cauchy criterion. Thus X]jez + v j converges in the generalized 
sense, because V is complete. 
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Actually, the same conclusion holds when converges in V for 

every x <E A, which is the same as saying that J^JLi x j v j converges for every 
x £ X. To see this, suppose for the sake of a contradiction that J2jez + v i d° es 
not satisfy the generalized Cauchy condition. This means that for each e > 
and finite set A C Z + there is a finite set B C Z + such that 



(151.9) 



E'< 



> e. 



By applying this repeatedly, we can get an infinite sequence B\ , B2 , . . . of finite 
subsets of Z + such that the maximal element of Bi is strictly less than the 
minimal element of Bi + \ for each I, and (|151.9I) holds with B = Bi for each I. 
Let y, z € X be defined by yj — 1 for each j, Zj = 1 when j € Bi for some I > 1, 



and z, 



-1 otherwise. If 6 ? , 



is the maximal element of B n , then 



(151.10) £ //, «j + E ^ ^ = 2 E ( E '••) ■ 

j=i 3=i 1=1 jeB t 

Thus the convergence of Y^-i Vj Vj and Y]^-\ Zj Vj imply the convergence of 

(151.11) 

This implies in turn that 
(151.12) 



E ( E v i 

1=1 jeB t 



*™ E v o = °. 



i— >-oo 

a contradiction. This shows that X^'ez + v 3 sa ti snes the generalized Cauchy 
condition, and hence converges in the generalized sense, because V is complete. 
Therefore J2jez + v j converges in the generalized sense if and only if Y^jLi x j v j 
converges for every x £ X, in which case the partial sums f n converge uniformly 
on X. 



152 Bounded sums 



Let V be a real or complex vector space with a norm ||w||, and let {1,-1}™ 
be the Cartesian product of n copies of {1,-1}, consisting of all sequences 
e = {ej}™ =1 of length n with tj — 1 or — 1 for each j. Also let Z(V) be the 
collection of sequences V\, V2, ■ ■ • of vectors in V for which the sums 52?= 1 £j Vj 
are uniformly bounded in V over e G {1, —1}™ and all positive integers n. This 
is a vector space over the real or complex numbers, as appropriate, with respect 
to termwise addition and scalar multiplication. If {vj}°^ 1 € Z(V), then put 

n 

(152.1) ||{^-}- 1 |U (y) =sup{ :ee{l,-l}",neZ + }. 
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Note that Z(V) is a linear subspace of the space X(V) of sequences {vj}°^ 1 of 
vectors in V with bounded partial sums Xw=i v ji discussed in Section [251 since 
we can take Cj = 1 for each j. Similarly, 



(152.2) 



|{«i}^ilU(v)<IK«i}^ilU(v) 



for each {vj}° c L 1 € Z(V). It is easy to see that ||{«j}?Li 



jlj=i\\z(y) 



is a norm on 



Z(V), and in particular that Vj — for every j when ||{« J }^L 1 ||z(v) 



0. 



Let {vj}^ 1 be a sequence of vectors in V, let B be a finite nonempty set of 
positive integers, and let n be the maximal element of B. If a.,/3 G {1, —1}" are 
defined by ay = 1 for each j, /3j = 1 when j <E B, and /3j = — 1 otherwise, then 



(152.3) 

and hence 
(152.4) 



E ^ + X! ^ '■• 2 E w j ' 

3=1 3=1 3'eB 



X! r ' - E"< r ' + X! ^ '■• 

3'6B 3=1 3=1 



If {vj}T= i e Z(V), then we get that 



(152.5) 



E U J 



<\\{v 3 }7=iWz(v)> 



which implies that {vj}°° =1 is in the space Y(Z + ,V) discussed in Section l27l 
and that 

(152.6) \\{vj}^i\\Y( Z+ ,v) < ||{«j}^illz(v)- 

Conversely, if {i>j}£L x G Y(Z+,V), n € Z+, and e € {1,-1}", then 



(152.7) 



E 

3=1 



e., v.. 



E v i- E 



l<3<n 
'-,=1 



l<3<re 
e,=-l 



which implies that 



(152.6 



E 

3 = 1 



< 



E 



E ^ 

i<j<n 

e,=-l 



<2\\{v j }? =1 \\ Y(x+iV) . 



Thus {Wj}^! € Z(V) and 

(152.9) \\{vj}jLi\\z<y) < 2 ||{M£il|y(z + ,v)> 

which shows that Y(Z + ,V) — Z(V), and that the corresponding norms are 
equivalent. 

Let Zq(V) be the closure in Z(V) of the collection of sequences {vj}JL 1 
of vectors in V with Vj = for all but finitely many j. This is the same as 
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the closure of this set in Y(Z + ,V), which is also the same as the collection 
Y (Z + ,V) of sequences {vj}°^ 1 of elements of V such that J2jez v i satisfies 
the generalized Cauchy criterion. If V is complete, then this is the same as the 
collection of sequences {vj}° c L 1 of vectors in V such that Ylj^z v 3 converges 
in the generalized sense, as usual. This characterization of Zq(V) is basically 
equivalent to the discussion in the previous section. 

Suppose that {vj}°^ 1 is a sequence of vectors in V that is not in Z(V). Thus 
for each N > 1 there is an n £ Z + and an e £ {1, —1}™ such that 



(152.10) 



E e i v i 
3=1 



> N. 



Equivalently, for each I, L > 1 there is an n > I and e/, . . . , e„ £ {1,-1} such 
that 

i-i 

(152.11) " 



>L+J2\\vji 

3=1 3=1 



This follows from the previous statement by taking N = L + 2 
using the triangle inequality to get that 

(152.12) 



and 



l-i 



E ' < r < - z2' j r J +Eini- 

3=1 3=1 3=1 

Applying (|152.1ip repeatedly, we get a strictly increasing sequence m, ri2, • ■ • of 
positive integers and a sequence ei, £2, . . . with ej G {1,-1} for each j such that 



(152.13) 
and 

(152.14) 



E e J V J 



3=1 



> 1 



E e J w i >fc + l + E HI 

i="fc+i 3=1 
for each fc > 1. Using the triangle inequality again, we get that 



(152.15) 

for each fc. Hence 
(152.16) 



E 

j=n k +l 



"3 u 3 



< 



nk+l 



E e J v i 



3=1 



3=1 



> k 



E e J v i 

3 = 1 

for each fc, so that the partial sums £j Vj are not uniformly bounded over 

n £ Z_|_ even for this single sequence e = {ej}°Z 1 . If {vj}° c L 1 is a sequence of 
vectors in V for which the partial sums ^2j—i £j are uniformly bounded over 
n £ Z + for each sequence e = {e^}!^ of elements of {1,-1}, then it follows 
that {v,}f =l £ Z(V). 
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153 Bounded coefficients 

Let E be a nonempty set, and let V be a real or complex vector space with a 
norm ||u||. Also let / £ Y(E,V) be given, as in Section l27l 1i A Q E, then 
we let 1a(x) be the indicator lunction associated to A on E, equal to 1 when 
x £ A and to when x £ E\A. Thus 

(153.1) £l A (aO/(aO= £ /(i) 

for every finite set B C E, which implies that 1a f £ Y(E, V), and that 

(153.2) \\lAf\\Y(E,V) < \\f\\Y(E,V)- 

Now let a be a real-valued function on E such that < a(x) < 1 for every 
x £ E. Let A\ be the set of x £ E such that a(x) > 1/2, and put 

(153.3) ai{x) = a{x) - -Ia^x). 

Thus < ai(x) < 1/2 for every x £ E, and we can repeat the process by taking 
Ai to be the set of x £ E such that a\(x) > 1/4. Continuing in this manner, 
we get a sequence of subsets Ai,A 2: ... of E such that 

oo 

(153.4) a{x) = J2 2 ~ jl AA x ) 

for each x £ E. If f £ Y(E,V), as before, then it follows that af £ Y(E,V) 
too, and that 

( 153 - 5 ) ||o/||y(B,V0 - \\f\W(E,V)- 

If a is a bounded nonnegative real-valued function on E, then we get that 
af £ Y(E,V), with 

(153.6) ||a/||r(B,v) < ll a lloo \\f\\Y(E,v)- 

If a is any bounded real- valued function on E, then we can apply the previous 
remarks to the positive and negative parts of a, to get that a f £ Y(E, V) and 

(153.7) \\af\W(E.y) < 2 ||a||oo ||/||y(B,V)- 

If V is complex and a is a bounded complex- valued function on E, then we can 
apply this to the real and imaginary parts of a, to get that a f £ Y(E, V) and 

(153.8) \\af\\ Y (E,v) < 4HU \\f\\ Y (E,vy 

In particular, multiplication by a defines a bounded linear operator on Y{E 1 V) 
in each case. 

Of course, if f(x) ^ for only finitely many x £ E, then af has the same 
property. This implies that a f £ Yq(E, V) when / £ Yq(E, V) and a is bounded, 
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because Yq(E,V) is the closure in Y(E,V) of the linear subspace of functions 
on E with finite support. Equivalently, J2 x ge a ( x ) /( x ) satisfies the generalized 
Cauchy condition when YlxeE f( x ) satisfies the generalized Cauchy condition 
and a is a bounded. If V is complete, then it follows that Y^xeE a ( x ) f( x ) 
converges in the generalized sense when Y^xee f( x ) converges in the generalized 
sense and a is bounded. 



154 Another norm 

Let E be a nonempty set, and let V be a real or complex vector space with a 
norm ||u||. Suppose that f(x) is a ^-valued function on E, and consider sums 
of the form 

(154.1) ^20(x)f(x), 

xeB 

where B C E is a nonempty finite set, and ft is a function on B with values in 
{1,-1}. Of course, this is the same as 

(154.2) £ f{x) J2 

x£B + igB_ 

where B± = {x € B : ft(x) = ±1}. If Z(E, V) is the space of ^-valued 
functions on E for which these sums have bounded norm, then it is easy to see 
that Z(E,V) is the same as the space Y(E,V) discussed in Section [57J More 
precisely, Z(E, V) C Y(E, V) because one can take ft(x) = 1 for each x e B, 
while Y(E, V) C Z(E, V) by the triangle inequality. If / e Y(E, V) = Z(E, V), 
then put 

(154.3) \\f\\z(E,v) -sup y)|8(a:)/(a 

where the supremum is taken over all nonempty finite sets B C E and functions 
ft : B -> {-1,1}. Note that 

(154.4) < ll/IU(^y) < 2 ||/||y ( jj,K), 

for the same reasons that y(i£, V) = Z(E, V). 

If E = Z + , then the Z(E, V) norm reduces to the Z(V) norm described in 
Section [152[ where we identify V- valued functions on Z + with sequences whose 
terms are in V. Clearly 

(154.5) \\f\\ z{v) < \\f\\z( Z+ ,v) 

for each / e Z(V) — Y(Z + , V), because the Z(V) corresponds to taking B to 
be of the form {1, . . . , n}, n € Z+, in the previous paragraph. Conversely, if B 
is any nonempty finite set of positive integers, and ft : B — > {1, —1}, then we 
can take n to be the maximal element of B, and put ej = e'j = ft (J) when j G B, 
and ej = 1 and = — 1 when 1 < j < n and j ^ i?. Thus 

n n 

(154.6) 2 ^ ft(j) f(j) = ^ ej /(j) + ^ 4 /(,•), 

jeB j=l j=l 
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and hence 



(154.7) 2 



<2 



\z{vy 



This implies that 

(154.8) \\f\\z(x + ,v) < ||/IU(V), 

by taking the supremum over B, j3. 

Let i? be any nonempty set again, and let A, B be disjoint nonempty finite 
subsets of E. Also let a, (3 be functions on A, B, respectively, with values in 
{1, —1}. Let 7, 7' be the functions on C = AUB defined by 7(1) = 7'(x) = a(x) 
when x € A and 7(2:) = —7' (at) = /3(x) when x £ B. If /(a;) is any V- valued 
function on E, then 



(154.9) 



and 



2 £ f(x) = 70*0 /0*0 + E 7'0*0 /(*) 



sec 



(154.10) 2 £ P(x) f(x) = J2 70*0 /0*0 " E 7 0*0 /(^ 



xeB 



x£C 



x£C 



In particular, 
(154.11) 2 



x; «(*)/(») < e 70*0/0*0 + Evokes) 



sec 



as in the preceding paragraph. 

Suppose now that is uniformly convex, and let e > be given. As in 
Section [1431 there is a <5i > such that \\v — w\\ < e whenever v,w € V 
satisfy ||u||, ||iu|| < 1 and \\(v + w)/2\\ > Si. Equivalently, \\v — w\\ < eR when 
HI, HI < Eand ||(v + w)/2|| > (1 - Si) R for any R > 0, by dividing by R. Let 
/ G Y(E, V) with / ^ be given, and let us apply this with R — \\f\\z(Ey)- 
By definition of ||/||z(.E.y), there is a nonempty finite set ACE and a function 
a : A — >• {1, —1} such that 



(154.12) 



E a ( x )f( 3 



x£A 



> (1 - *l) II /|| Z(i^). 



Let B be another nonempty finite subset of E that is disjoint from A, and let (3 
be a function on B with values in {1, —1}. If C, 7, and 7' are as in the previous 
paragraph and 



(154.13) 



v = E 70*0/0*0, w = E 7' 0*0/0*0, 

x£C x£C 



then IMUHI < \\f\\z(E,v), and 
v + w 



(154.14) 



E a 0*o/( 3 



xeA 



> {I- Si) \\f\\z{E,V)- 
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Because of uniform convexity, we get that 



(154.15) 



xEB 



\\v-w\\ < e\\f\\ Z (E,v)- 



It follows that ^2 xeE f(x) satisfies the generalized Cauchy condition, and hence 
converges in the generalized sense when V is also complete. 



155 Additional properties 



Let £ be a nonempty set, and let V be a real or complex vector space with a 
norm ||w||. If a : E ^ {1,-1} and / G Y(E,V), then af G Y(E,V), and in fact 

(155.1) \\f\\z(E,V) =SUp||o/||y( £ !,V-), 

a 

where the supremum is taken over all such mappings a. In particular, 

(155.2) ||6/||z(b,v) = ||/||z(b,v) 

for every / G Y(E, V) and b : E -> {1, -1}. If / € F(E, V) and fc is a bounded 
real- valued function on E, then bf G Y(E 7 V), as in Section [T53l and 



(155.3) 



\\bf\\z(E,V) < ll^lloo ||/|U(B,V). 



This follows from the analogous statement for the Y(E, V) norm in Section [153] 
when b is nonnegative, and otherwise one can express b as the product of a 
nonnegative function and a function with values in {1,-1}. 

Suppose now that V is a complex vector space, and let T be the unit circle 
in the complex plane, consisting of the complex numbers z with \z\ = 1. If 
a : E T and / G Y(E, V), then a / G Y(E, V) and 

(155.4) ||a/||y(B,TO < 4 
as in Section fl 531 Put 

(155.5) \\f\\w{E.y) = sup\\af\\ Y (E,v), 



where the supremum is taken over all mappings a : E 
that this is a norm on Y(E, V), and that 

(155.6) \\f\\Y(E,V) < \\f\\w(E,V) < 4 \\f\\Y(EX) 

for every / G Y(E, V). Equivalently, 



T. It is easy to see 



(155.7) 



\w(e,v) = sup 

B,P 



E fix) 



xEB 
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where the supremum is taken over all nonempty finite sets B C E and functions 
/3 : B -> T. 

More precisely, one can also check that 

(155.8) \\f\\z{E,v) < \\f\\w(E,v) < 2 \\f\\z(E,v) 

for every / € Y(E, V). The first inequality follows from the definitions and the 
fact that 1,-1 £ T. The second inequality uses the estimate 

(155.9) \\af\\ Z (E,v) < HaWooW f\\z(E,v) 

for every bounded complex- valued function a on E and / G Y(E,V). This 
follows from (|155.3|) applied to the real and imaginary parts of a. 
By construction, 

(155.10) ||6/|| 

W(E,V) — \\J\\W(E,V) 

for every / £ Y(E, V) and 6 : E — ^ T. If b is a bounded complex-valued function 
on E, then 

(155.11) ||&/||w(J5,V r ) < \\b\\oo \\f\\w(E,V) 

for every / € Y(E,V). In the case where b is a bounded nonnegative real- 
valued function on E, this follows from the corresponding statement for the 
Y(E, V) norm in Section 11531 Otherwise, one can express b as the product of 
a nonnegative real-valued function and a function with values in T, to get the 
same conclusion from the previous two cases. 

156 Tori 

Let T be the unit circle in the complex plane, as before. It is well known that 

(156.1) / z\dz\ = 0, 

Jt 

where \dz\ denotes the element of integration with respect to arc length. One 
way to see this is to compare this integral with a line integral, 

(156.2) / iz\dz\ = I dz = 0, 

Jt Jt 

using the fact that the unit tangent vector to T at a point z E T corresponds 
to i z with respect to the standard orientation. Alternatively, one can use the 
change of variables z H> — z to get that 

(156.3) / z\dz\ = - z\dz\, 

Jt Jt 

and hence that the integral is 0, because arc length is not affected by this 
transformation. 
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Of course, T is a compact Hausdorff topological space, and a probability 
space with respect to arc length measure divided by 2tt. As usual, the n- 
dimensional torus T™ is the Cartesian product of n copies of T, consisting 
of ordered n-tuples z = (zi,...,Z n ) with Zj G T for j = 1, ...,n. This is 
also a compact Hausdorff topological space for each n, and a probability space 
with respect to the corresponding product measure. The coordinate functions 
Z\ , . . . , z n may be considered as complex- valued independent random variables 
on T™. 

Similarly, we can consider the space T°° of sequences z = {zj}°^ 1 such that 
zj 6 T for each j, which is the Cartesian product of a sequence of copies of 
T. This is a compact Hausdorff topological space with respect to the product 
topology, and a probability space with respect to the product measure. The 
coordinate functions zi,Z2,... form an infinite sequence of independent random 
variables on this infinite-dimensional torus, as before. Note that the sequences 
x = {xj}J^ 1 with Xj = 1 or — 1 for each j form a closed set in T°°. 

Let (V, \\v\\) be a complex Banach space, and let {vj}'^L 1 be a sequence of 
elements of V. Consider the T^-valued functions 

n 

(156.4) /«(*)= 

3=1 

on T°° for each n > 1. If J2jez v j converges in the generalized sense, then 
{fn}^Li converges uniformly on T°°. This is similar to the discussion in Section 
11511 using also the estimates in Section [1531 or the W(Z + ,V) norm in the 
previous section, which is basically the same. The converse statements discussed 
in Section 11511 are already applicable in this situation, because 1,-1 G T. 

157 Norms and linear functionals 

Let £ be a nonempty set, and let V be a real or complex vector space with 
a norm ||u||. If / G Y(E, V) and A is a bounded linear functional on V, then 
A(/(x)) is a summable function on E, as in Section [3D1 Put 

(157.1) WfWuE.v) = sup { £ |A(/(z))| : A G V*, ||A||* < l}. 

x£E 

As in Section [30l this is less than or equal to 2 ||/||we,v) in the real case, less 
than or equal to 4 ||/||wb,v1 m the complex case, and greater than or equal to 
ll/1|y(.E,v) m both cases. It is easy to see from the definition that ||/||i(_E,y) is 
a norm on Y(E, V), and that 

(157.2) \\bf\\ L (E,v) < II felloe || f\\nE,v) 

for every / G Y(E, V) and bounded real or complex-valued function b on E, as 
appropriate. If B C E is a finite set, /3 is a real or complex- valued function on 
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B such that \/3(x)\ — 1 for each x £ B, and X £V*, then 



(157.3) 



A( = ^/3(x)A(/(a;)) < ^ |A(/(x))|, 



xeB 



x£B 



with equality in the last step for suitable choices of /3. Using this, one can check 
that is equal to H/Hz^y) i n the real case, and is equal to ||/||w'(B,v) 

in the complex case. 



158 Sums and Co(E) 

Let £ be a nonempty set, and let (V, \\v\\) be a real or complex Banach space. 
If / € Y(E, V) and a is a bounded real or complex-valued function on E, as 
appropriate, then af £ Y(E, V) and 

(158.1) \\af\\YtE,v)<Ha\\°°\\f\W{E,v) 
in the real case, and 

(158.2) \\af\\ Y (E,v) < i\\a\\oo\\f\\Y(E,v) 

in the complex case, as in Section ["153l If a € cq(E), then it follows that a f is in 
Yq(E, V), since a can be approximated by functions with finite support in the £°° 
norm. This is the same as saying that YjxeE a i x ) f( x ) satisfies the generalized 
Cauchy criterion when a £ cq(E), and hence converges in the generalized sense 
because V is complete. Thus 

(158.3) 2>(a) = $>(*)/(*) 

x£E 

defines a bounded linear mapping from cq(E) into V. One can check that the 
operator norm of Tf is equal to the Z(E, V) norm of / in the real case, and 
is equal to the W(E, V) norm of / in the complex case. Conversely, if T is a 
bounded linear mapping from cq(E) in V, then T — Tf for some / £ Y(E, V). 
To see this, one can take 

(158.4) f(x) = T(S X ), 

where 5 X is the function on E defined by 8 x (x) — 1 and S x (y) — when x ^= y. If 
a is a real or complex- valued function on E with finite support, then a is a linear 
combination of finitely many 8 x 's } and so T(a) is given by the same expression 
as Tf(a), because of linearity. Using this and the boundedness of T, one can 
show that / <E Y(E, V), and more precisely that the Z(E, V) norm of / is less 
than or equal to the operator norm of T in the real case, and that the W(E, V) 
norm of / is less than or equal to the operator norm of T in the complex case. 
This implies that T(a) = Tf{a) for every a £ cq(E), because T, Tf are bounded 
linear operators which agree on the dense linear subspace of cq(E) consisting of 
functions a with finite support. 
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159 Integrability 



Let (X,A,/j) be a measure space, and let (V, \\v\\) be a real or complex Banach 
space. As in Section 11201 it is easy to deal with integration of functions with 
values in a finite-dimensional subspace of V. Suppose that {fj}fLi is a sequence 
of V- valued functions on A such that each fj takes values in a finite-dimensional 
subspace of V, each fj is integrable in the sense of Section fl20[ and 

(159.1) lim I \\fj - /^l dfi = 0. 
This implies in particular that the sequence of integrals 

(159.2) f fjdix 

Jx 

is a Cauchy sequence in V, and hence converges in V, by completeness. 
A sufficient condition for this type of convergence to hold is that 

00 

(159.3) V / ||/ J --^ +1 ||d/i<oo. 

Jx 



This is the same as 



(159.4) / ^||/ J -/ J+1 ||d / i<oo ! 
which implies that 

00 

(159.5) 52\\fj(x)-fs+i(*)\\<°° 

3=1 

for almost every x S X. It follows that 

00 

(159.6) ^2(fj(x)-f j+1 (x)) 

i=i 

converges in V for almost every x e A, by completeness again. Put 

(159.7) f{x) = lim fj{x), 

which exists for almost every x S X by the convergence of the previous sum. Of 
course, any sequence of V-valued functions as in the preceding paragraph has 
a subsequence that satisfies this summability condition, and hence converges 
almost everywhere. 

Under these conditions, put 



(159.8) / fdn= lim / fjd^i. 

Ix ^°°Jx 
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This is basically the definition of the Bochncr integral. Note that {||/j(^)||}^i 
converges in L X (X) to ||/(x)||, which implies that 



(159.9) 



/ fdp < f 



\dfj,. 

ix 



Similarly, if A is a bounded linear functional on V, then X(fj(x)) converges in 
L X {X) to A(/(x)), and hence 

(159.10) x(J fd/jJ=J Xofd^i. 

This shows that the integral of / does not depend on the particular sequence of 
approximations . 

Remember that a function on X with values in a topological space is said to 
be measurable if the inverse image of every open set in the range is measurable. 
Thus the composition of a measurable function with a continuous mapping to 
another topological space is also measurable. If / : X — > V is measurable with 
respect to the topology on V associated to the norm, then it follows that ||/(x)|| 
is measurable too. If in addition ||/(x)|| is integrable and V is separable, then 
/ can be approximated by integrable functions with values in finite-dimensional 
subspaces of V, as before. To see this, one can start by using the integrability 
of ||/(x)|| to approximate / by bounded measurable V- valued functions that 
are equal to on the complements of suitable subsets of finite measure. One 
can then use the separability of V to approximate these functions by V- valued 
simple functions. The same argument would work if / takes values in a separable 
subspace of V almost everywhere on X. 



160 Bounded measures 

Let X be a set, let A be an algebra of subsets of X, and let V be a real or 
complex vector space. A T^-valued function \i on A is said to be a finitely- 
additive V- valued measure on (X, A) if 

(160.1) fj,(A U B) = n{A) + fi(B) 

for every A,B e A with An B = 0. If Ai, . . . , A n e A and ti, . . . , t n e R or C, 
as appropriate, then 

n 

(160.2) f{x)=^tjl Ai {x) 

3 = 1 

is a measurable simple function on X, and we put 



(160.3) 




It is easy to see that this does not depend on the particular representation of 
/ as a linear combination of indicator functions, and that it defines a linear 
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mapping from the vector space of measurable simple functions on X into V. If 
A is a linear functional on V, then /J,\(A) — X(fi(A)) is a finitely-additive real 
or complex measure on (X, A), as appropriate, and 



(160.4) AN fd^j = / f d/j,\. 

J x <J x 

Suppose now that V is equipped with a norm ||u||, and that fi is bounded, 
so that 

(160.5) C{p) = sup{\\fi(A)\\ :AeA}<oo. 

If Ai,... , A n are finitely many pairwise-disjoint measurable subsets of X and 
E n = {1, ...,n}, then fJb(Aj) may be considered as a V- valued function on 
E n whose Y(E n ,V) norm is less than or equal to C(/x), because of the finite 
additivity of \x. As in Section fl 5 31 it follows that 



(160.6) 



fdfJ, 
x 



< kC{fi) sup |/(^)| 

x£X 



for every measurable simple function / on X, where k = 1 when / is real- valued 
and nonnegative, k — 2 when / is real-valued, and k = 4 when / is complex- 
valued. If A is a cr-algebra and V is complete, then the integral can be extended 
to bounded measurable real or complex- valued functions / on X, as appropriate, 
because simple functions are dense in the space of bounded measurable functions 
with respect to the supremum norm. If A is a bounded linear functional on V , 
then n\ is a bounded finitely-additive real or complex measure on (X, A), with 

(160.7) C(ji X )< ||A||*C(m), 

and we get the same relationship with the integral of a bounded measurable 
function as for simple functions. 

In particular, this works when A is a cr-algebra and fx is countably additive, 
in the sense that 

oo oo 

(160.8) &(4*)=m(UA 

3=1 3=1 

for every sequence Ai,A 2 ,... of pairwise-disjoint measurable subsets of X, as 
in Section [37] More precisely, convergence of the series on the left in V is part 
of the hypothesis, and we have seen that this implies that /x is bounded. In 
this case, fi\ is a countably-additive real or complex measure on (X, A) for 
each X £ V* , a.s before. If /i has the additional property that Y^jLi II MA?) II 
converges for every sequence A\, A 2 , . . . of pairwise-disjoint measurable subsets 
of X, then one can integrate any / € ^(X, ||/z||), as in Section [1331 If instead 
V is a Hilbert space and n{A) is orthogonal to fJ,(B) when A, B are disjoint 
measurable subsets of X, then the integral can be defined on a suitable L 2 space, 
as in Section [1531 

As another situation like this, suppose that V — W* for some Banach space 
W, A is a CT-algebra, and \i is countably additive with convergence in the weak* 
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topology on V. This implies that fi w (A) = ^{A){w) is a countably-additive real 
or complex measure on (X, A) for each w £ W. This is the same as the measure 
[i\ defined before, where A is the bounded linear functional on V corresponding 
to evaluation at w. Using the uniform boundedness principle, one can show 
that ii is bounded, as in Section [37J Under these conditions, the integral of a 
bounded measurable function f on X can be defined more directly as a bounded 
linear functional on W by 

(160.9) (J fdtyw) = J fdnv, 

which is also satisfied by the previous definition. 

161 Weak* measurability 

Let W be a real or complex vector space with a norm which is separable, 

and let Wi,W2, ■ ■ • be a sequence of elements of W such that ||ifj||vK = 1 f° r each 
j and the set of w/s is dense in the unit sphere in W. This uses the fact that 
a subset of a separable metric space is also separable. Note that 

(161.1) HAIIht. =sup|A(tu i )| 

for every bounded linear functional A on W. Also let (X, A) be a measurable 
space, and let / be a function on X with values in W*. If f(x)(w) is measurable 
as a real or complex- valued function on X for every w £ W, then it follows that 
|| f(x) || v is measurable on X as well. 

Let us say that / : X —> W* is weak* measurable if / is measurable with 
respect to the weak* topology on W* . This automatically implies that f(x)(w) 
is measurable for each w € W, since evaluation at w is a continuous function 
on W* . Conversely, / is weak* measurable when f(x)(w) is measurable for 
every w € W and W is separable. To see this, one may as well suppose that / is 
bounded, because one can use the measurability of ||/(x)|| w to express X as the 
union of a sequence of measurable sets on which / is bounded. If B is a ball in 
W*, then the topology on B induced by the weak* topology on W* is metrizable, 
because W is separable, as in Section [531 If S is a closed ball in W*, then B is 
also compact in the weak* topology, by the Banach-Alaoglu theorem. Thus B 
is compact and metrizable with respect to the topology induced by the weak* 
topology, and hence is separable with respect to this topology. This implies 
that relatively open subsets of B in the weak* topology can be given in terms 
of countable unions of basic open sets, which permits the weak* measurability 
of / to be obtained from the measurability of f(x)(w) for each w £ W. 

Of course, / is weak* measurable if / is measurable with respect to the 
topology on W* associated to the dual norm, because every open set in W* with 
respect to the weak* topology is also open in the norm topology. Conversely, if 
/ is weak* measurable and W* is separable, then / is measurable with respect 
to the norm topology on W* . Indeed, separability of W* implies that each open 
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set U C W* in the norm topology is a countable union of closed balls. If B 
is a closed ball in W* , then B is a closed set in W* in the weak* topology by 
the definition of the dual norm, and so f~ 1 (B) is measurable in X by weak* 
measurability. It follows that is the union of countably many measurable 

subsets of X, and hence is measurable. 

Similarly, if V is a real or complex vector space with a norm \\v\\v, then 
we say that / : X — > V is weakly measurable if / is measurable with respect 
to the weak topology on V. If / is measurable with respect to the topology 
on V associated to the norm, then / is weakly measurable, because every open 
set in V with respect to the weak topology is also an open set in the norm 
topology. Conversely, if / is weakly measurable and V is separable, then / is 
measurable with respect to norm topology on V. As before, separability of V 
implies that every open set U C V in the norm topology is the countable union 
of closed balls. In this case, the fact that a closed ball B in V is also closed in 
the weak topology uses the Hahn-Banach theorem. If / is weakly measurable, 
then it follows that f' 1 (B) is a measurable set in X for each closed ball B in 
V, and hence that / _1 (C/) is measurable in X for each open set U C V in the 
norm topology. If V* is separable, then one can argue as before that ||/(x)|| 
is measurable on X when A(/(x)) is measurable for each A € V*. The same 
argument shows that ||/(x) — v\\ is measurable on X for every v € V under these 
conditions, so that is measurable in X for each ball B in V. One can 

then use separability of V again to get that / is measurable with respect to the 
norm topology on V. 



162 Weak* measures 



Let (X,A) be a measurable space, and let (W, \\w\\w) be a real or complex 
Banach space. Let us say that a function fi on A with values in the dual W* of 
W is a weak* measure if 

oo oo 

(162.1) 5>(^) = /*( IMi) 

3=1 3=1 

for every sequence Ai,A 2 ,... of pairwise-disjoint measurable subsets of X, 
where the series is supposed to converge in the weak* topology on W* . This is 
equivalent to asking that \x be finitely additive, and that 

oo 

(162.2) km n(Bj) = J\J B 3 ) 

j^oo \ ^-^ / 

3 = 1 

in the weak* topology for every increasing sequence B±, B2, ■ ■ ■ of measurable 
subsets of X. This is also equivalent to the condition that \i be finitely additive 
and satisfy 

00 

(162.3) lim n{Cj) = J C] C 3 ) 

3 = 1 
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in the weak* topology for every decreasing sequence Ci, C%, . . . of measurable 
subsets of X. This is also the same as saying that 



(162.4) fi w (A) = fi(A)(w) 

is a countably-additive real or complex measure on (X, A) , as appropriate, for 
every w £ W. 

Remember that convergent sequences in W* in the weak* topology are 
bounded with respect to the dual norm when W is complete, by the theorem of 
Banach and Steinhaus. If fj, is a weak* measure on (X, A) with values in W* , 
then there is a C > such that 

(162.5) \\v(A)\\w* < C 

for every A £ A, by the same arguments as in Section [37J Equivalently, 

(162.6) \^ W (A)\ < C \\w\\ w 
for every w £ W and A £ A, which implies that 

(162.7) \^ W \{X) <kC\\w\\ w 

for every w £ W, where k = 2 in the real case and k — 4 in the complex 
case. Thus w t— > /i w defines a bounded linear mapping from W into the space 
of real or complex measures on {X, A), as appropriate, equipped with the norm 
associated to the total variation. Conversely, a bounded linear mapping from 
W into the space of real or complex measures on (X, A) determines a weak* 
measure on (JT, A) with values in W* in this way. 

If v is a nonnegative measure on (X, A) and / £ L l (X, v), then 

(162.8) u f (A) = [ fdv 

J A 

defines a real or complex measure on (X, A) , as appropriate. Thus a bounded 
linear mapping from W into L X (X, v) determines a weak* measure /i on (X, A) 
with values in W* , as in the previous paragraph. In this case, n is absolutely 
continuous with respect to v, in the sense that [i(A) = for every measurable 
set A C X with v{A) = 0, because vj is absolutely continuous with respect to v 
for every / £ L X (X, v). Of course, any weak* measure /j, on (X, A) with values 
in W* is absolutely continuous with respect to v in this sense if and only if \x w 
is absolutely continuous with respect to v for each w £ W. If v is cr-finite, then 
the Radon-Nikodym theorem implies that every weak* measure ji on (X, A) 
that is absolutely continuous with respect to v corresponds to a bounded linear 
mapping from W into L X (X, v). 

If E is a nonempty set, then Y(E, W*) can be identified with the space of 
bounded linear mappings from W into ^{E). This is basically another way of 
looking at the discussion in Section [32] We can also think of i 1 (E) as being 
the L 1 space associated to counting measure on E, so that elements of ^(E) 
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determine real or complex measures on E as in the preceding paragraph. More 
precisely, these are measures defined on arbitrary subsets of E. It follows that 
elements of Y(E, W*) determine bounded linear mappings from W into real or 
complex measures on E, as appropriate, and hence weak* measures on E with 
values in W*. 

163 Weak* integrability 

Let (X, A, v) be a measure space, let W be a real or complex vector space with 
a norm Also let / be a W-^-valued function on X such that f(x)(w) 

is measurable on X for each w £ W. If W is separable, then it follows that 
|| /(a:) || w is measurable on X, as in Section [TCT1 Alternatively, if / : X —> W* 
is measurable with respect to the weak* topology on W* , then we get that 
f(x)(w) is measurable for each w £ W and that ||/(a;)||w* is measurable. The 
latter uses the fact that closed balls in W* are closed sets in the weak* topology, 
by definition of the dual norm. 

At any rate, if ||/(x)|| w is integrable with respect to v, then f(x)(w) is also 
integrable with respect to v for each mGW, and 

(163.1) f \f(x){w)\dv(x)<\\w\\ w [ \\f{x)\\ w ,du{x) 
Jx Jx 

for every w £ W. In particular, w i— > f(x)(w) is a bounded linear mapping from 
W into L 1 (X, f), which leads to a weak* measure /i on (X,A) with values in 
W* , as in the previous section. More precisely, 

(163.2) fiw(A) = fi(A)(w) = [ f(x){w)du{x) 

J A 

for every measurable set iCI and w e W, which implies that 

(163.3) WtiMw < I \\f{x)\\w*dv. 

J A 

If Ax, A2,... is a sequence of pairwise-disjoint measurable subsets of X, then 
it is easy to see that J^jLi A*(^j) converges absolutely with respect to the dual 
norm on W* , and that the sum is equal to m(U^Li A/)- 

Let be the total variation measure associated to fj, as in Section I3T1 

Thus = p*(A) corresponds to p(A) = \\fi(A)\\w* a s in Section [551 In 

this case, 

(163.4) Nl(^)< / \\f(x)\\w* dv(x) 

J A 

for each A £ A, because of (|163.3p . Of course, 

(163.5) \lho{A)\ < \\fi(A)\\ w . \\w\\ w < HmII(^) \H\w 
for every A € A and w £ W, which implies that 

(163.6) \(M W \(A) < \\n\\(A)\\w\\ w , 
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where \[i w \ is the total variation measure associated to fi w . Hence 

(163.7) / \f(x)(w)\dv(x)<\\n\\(A)\\w\\ 

J A 

for every A e A and w G W. 

Suppose that W is separable, and let w\ , W2 , ■ ■ ■ be a sequence of elements 
of W such that ||wj||vK = 1 for each j and the set of wj's is dense in the unit 
sphere in W. If 

(163.8) 4>n{x) = max \fj(x)(wj)\, 

l<j<n 

then 4> n {x) is measurable on X for each n, 

(163.9) 4> n {x) < 4> n +i{x) < \\f(x)\\w; 
for each x € X and n > 1, and 

(163.10) lim 4> n {x) = su P 0„(x) = ||/(a;)|| w . 

rwoc n>1 



for each iel. Let i C I be a measurable set, and let A\, . . . , A n be pairwise- 
disjoint measurable subsets of X such that Uj=i = ^- Observe that 

(163.11) Yl / l/(^)K)l^)<E^H^) = ^IK A )- 

This implies that 

(163.12) Jj n {x)dv{x) < \\n\\(A) 

for each n. Using the monotone convergence theorem, we get that 

(163.13) f \\f(x)\\ w .dv(x)<\\n\\(A). 

J A 

It follows that 

(163.14) M(A)= ( \\f{x)\\ w .dv{x) 

J A 

for every A & A when W is separable. 
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